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Abstract. We derive a central limit theorem for the maximum of a 
sum of high dimensional random vectors. Specifically, we establish con- 
ditions under which the distribution of the maximum is approximated 
by that of the maximum of a sum of the Gaussian random vectors with 
the same covariance matrices as the original vectors. The key innova- 
tion of this result is that it applies even when the dimension of random 
vectors (p) is large compared to the sample size (n); in fact, p can be 
much larger than n. We also show that the distribution of the maxi- 
mum of a sum of the random vectors with unknown covariance matrices 
can be consistently estimated by the distribution of the maximum of 
a sum of the conditional Gaussian random vectors obtained by mul- 
tiplying the original vectors with i.i.d. Gaussian multipliers. This is 
the multiplier bootstrap procedure. Here too, p can be large or even 
much larger than n. These distributional approximations, either Gauss- 
ian or conditional Gaussian, yield a high-quality approximation to the 
distribution of the original maximum, often with approximation error 
decreasing polynomially in the sample size, and hence are of interest in 
many applications. We demonstrate how our central limit theorem and 
the multiplier bootstrap can be used for high dimensional estimation, 
multiple hypothesis testing, and adaptive specification testing. All these 
results contain non-asymptotic bounds on approximation errors. 



1. Introduction 

Let Xi, . . . , Xn be independent random vectors in MP , with each Xi having 
coordinates denoted by = (xji, . . . ,Xip)' . Suppose that each Xi 

is centered, namely E[xj] = 0, and has a finite covariance matrix E[xjX^]. 
Consider the rescaled average: 

(1) X:=(Xi,...,Xpy :=^Vxi. 
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Our goal is to obtain a distributional approximation for the statistic Tq 
defined as the maximum coordinate of vector X: 

Tq := max Xj, 

The distribution of Tq is of interest in many applications. When p is fixed, 
this distribution can be approximated by the classical Central Limit The- 
orem (CLT) applied to X. However, in modern applications, cf. p 
is often comparable or even larger than n, and the classical CLT does not 
apply in such cases. This paper provides a tractable approximation to the 
distribution of Tq when p is large and possibly much larger than n. 

The first main result of the paper is the Gaussian approximation theorem, 
which bounds the Kolmogorov distance between the distributions of Tq and 
its Gaussian analog Zq. Specifically, let yi, . . . be independent centered 
Gaussian random vectors in such that each yi has the same covariance 
matrix as Xj, namely yi ~ N{0,E[xix'^). Consider the rescaled average of 
these vectors, 

(2) Y :={¥,,..., Y,y:=^j2y^. 

^ 1=1 

Vector Y is the Gaussian analog of X in the sense of sharing the same 
mean and covariance matrix, namely E[X] = E[y] = and E[XX'] = 
E[yy] = n^^ Z]r=i We then define the Gaussian analog Zq of Tq as 

the maximum coordinate of vector Y: 

(3) Zq := max Y,-. 

Our main result shows that, under suitable moment assumptions, as n — )> oo 
and possibly p = p„ — )• oo, 

(4) p := sup |P(ro ^ t) - P{Zq ^ t)K Cn-^ 0, 

where constants c > and C > are independent of n. 

Importantly, in p can be large in comparison to n and be nearly 

as large as e°^"'^ For example, if uniformly bounded (namely, 

^ Ci for some constant Ci > for all i and j) the Kolmogorov dis- 
tance p converges to zero at a polynomial rate whenever (logp)^ /n — t- at 
a polynomial rate. We obtain similar results when sub-exponential 
and even non-sub-exponential under suitable moment assumptions. Figure 
[T] illustrates the result ([4]) in a non-subexponential example, which is moti- 
vated by the analysis of the Dantzig Selector of in non-Gaussian settings 
(see Section 11]). 

The proof of the Gaussian approximation result ([4]) builds on a number 
of technical tools such as Slepian's smart path interpolation (which is re- 
lated to the solution of Stein's partial differential equation; cf. Appendix 
E), Stein's leave-one-out method, approximation of maxima by the smooth 
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Figure 1. p-p plots comparing distributions of To and Zo in the 
example motivated by the problem of seleting the penalty level of the 
Dantzig selector. Here Xij QiXQ generated as Xij — ZijEi with Ei ~ i(4), (a 
t-distribution with four degrees of freedom), and Zij are non-stochastic 
(simulated once using f/[0, 1] distribution independently across i and j). 
The dashed line is 45° . The distributions of To and Zq are close, as (qual- 
itatively) predicted by the CLT derived in the paper: see Corollaries 12. II 
or l2.2l The quality of the Gaussian approximation is particularly good 
for the tail probabilities, which is most relevant for practical applica- 
tions. 




functions (related to "free energy" in spin glasses) , and exponential inequal- 
ities for self- normalized sums. See, e.g., 

m 0, M, m M, m for 

introduction and discussion of some of these tools. It also critically relies 
on the anti-concentration and comparison bounds of maxima of Gaussian 
vectors derived in [l^ and restated in this paper as Lemmas 12.11 and 13.11 

Our new Gaussian approximation theorem has the following innovative 
features. To the best of our knowledge, this is the first general result that 
establishes that maxima of sums of random vectors can be approximated 
in distribution by the maxima of sums of Gaussian random vectors when 
p S> n and especially when p is of order e"" for some c > 0. The existing 
techniques can also lead to results of the form ^ when p = p„ — oo, but 
under much stronger conditions on p. For example, Yurinskii's coupling 
implies ([1]) but requires p^/n — t- 0; see Example 17 (Section 10) in [37]. 
Second, our Gaussian approximation theorem covers cases where Tq does not 
have a limit distribution as n — t- oo and p = p„ — >• oo. In some cases, after a 
suitable normalization, Tq could have an extreme value distribution as a limit 
distribution, but the approximation to an extreme value distribution requires 
some restrictions on the dependency structure among the coordinates in Xj. 
Our result does not require such restrictions on the dependency structure. 
Third, the quality of approximation in ([4]) is of polynomial order in n, which 
is better than the logarithmic in n quality that we could obtain in some 
(though not all) applications using the approximation of the distribution of 
To by an extreme value distribution (see [33]). 
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Our result also contributes to the literature on multivariate central limit 
theorems, which are concerned with conditions under which 

(5) \F{X eA)-F{Y eA)\^0, 

uniformly in a collection of sets A, typically all convex sets. Such results 
were developed among others, by [H,!!!, H, 7, 14|, under conditions of type 



p^/n — )• (also see [13] )• These results rely on the anti-concentration results 
for Gaussian random vectors on the (5-expansions of boundaries of arbitrary 
convex sets A (see Note that our result also establishes ([5]), but uni- 
formly for all convex sets of the form ^Imax = {a G : maxi^j^p aj ^ t} 
for t G M. These sets have a rather special structure that allows us to deal 
with p ^ n: in particular, concentration of measure on the 5-expansion of 
boundary of ^max is at most of order 5\/logp for Gaussian random vectors 
with unit variance, as shown in [13] (see also Lemma l2.ip . (The relation ([5]) 
with A = ^max explains the sense in which we have a CLT, as appearing in 
the title of the paper.) 

Note that the result (jH) is immediately useful for inference with statistic 
To, even though P{Zq ^ t) needs not converge itself to a well behaved 
distribution function. Indeed, if the covariance matrix X^^Li E[a;ix'J 
is known, then czo{l — a) := (1 — a)-quantile of Zq, can be computed 
numerically, and we have 

(6) lP(To ^ czo(l - a)) - (1 - a)\ ^ Cn"^ ^ 0. 

A chief application of this kind arises in determination of the penalty level 
for the Dantzig selector of [11] in the high-dimensional regression with non- 
Gaussian errors, which we examine in Section 5. There, under the canonical 
(homoscedastic) noise, the covariance matrix is known, and so quantiles of 
Zq can be easily computed numerically and used for choosing the penalty 
level. However, if the noise is heteroscedastic, the covariance matrix is no 
longer known, and this approach is no longer feasible. This motivates our 
second main result. 

The second main result of the paper establishes validity of the mul- 
tiplier bootstrap for estimating quantiles of Zq when the covariance ma- 
trix Sr=i -^[^j^i] is unknown. More precisely, we define the Gaussian- 
symmetrized version Wq of Tq by multiplying Xi with i.i.d. standard Gauss- 
ian random variables ei, . . . , e^: 

1 " 

(7) Wo := max — ^XijCj. 



We show that the conditional quantiles of Wq given data {xi)^^i are able to 
consistently estimate the quantiles of Zq and hence those of Tq (where the 
notion of consistency used is the one that guarantees asymptotically valid 
inference). Here the primary factor driving the bootstrap estimation error 
is the maximum difference between the empirical and population covariance 
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matrices: 



A := max 



1 " 



which can converge to zero even when p is much larger than n. For example, 
when uniformly bounded, the multiplier bootstrap is valid for infer- 

ence if (logp)^/n — )• 0. Earlier related results on bootstrap in the "p — )• oo 
but p/n ^ G"'' regime were obtained in [sl]; interesting results for the case 
p ^ n based on concentration inequalities and symmetrization are studied 
in 0, 0] , albeit the approach and results are quite different from those given 
here. In particular, in {3], either Gaussianity or symmetry in distribution is 
imposed on the data. 

The key motivating example of our analysis is the high-dimensional sparse 



regression model. In this model, jlll ] and [g] assume Gaussian errors to an- 
alyze the Dantzig selector and Lasso. Our results show that Gaussianity is 
not necessary and the Gaussian-like conclusions hold approximately, with 
just the fourth moment of the regression errors being bounded. Moreover, 
our approximation allows to take into account correlations among the regres- 
sors. This leads to a better choice of the penalty level and tighter bounds 
on performance than those that had been available previously. For example, 
some of the same goals had been accomplished using moderate deviations 
for self-normalized sums, combined with the union bound @]. However, the 
union bound does not take into account correlations among the regressors, 
and so it may be overly conservative in some applications. 

Our results have a broad range of other applications. In addition to the 
high-dimensional estimation example, we show in the Supplemental Material 
how to apply our results in the multiple hypothesis testing via the step- 
down method of (4ol | and to specification testing. In either case number of 
hypotheses to be tested or the number of moment restrictions to be tested 
can be much larger than the sample size. Lastly, in a companion work 
((l?!). we are exploring the strong coupling for suprema of general empirical 
processes based on the methods developed here and maximal inequalities. 
These results represent a useful complement to the results based on the 
Hungarian coupling developed by 32, [^, [s^, [l^ for the entire empirical 



process and have applications to inference in nonparametric problems such 
as construction of uniform confidence bands (see, e.g., fij]). 



1.1. Organization of the paper. In Section [21 we give the results on 
Gaussian approximation, and in Section [3] on the multiplier bootstrap. In 
Section HI we present an application to the Dantzig selector. Appendices 
lAllP] contain proofs for each of these sections, with Appendix|A] stating aux- 
iliary tools and lemmas. Due to the space limitation, we put additional re- 
sults and proofs into Supplemental Material, Appendices iGllFl In particular. 
Appendices iGl and iHl provide additional applications to multiple hypothesis 
and adaptive specification testing. 
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1.2. Notation. In what follows, unless otherwise stated, we will assume 
that p ^ 3. In making asymptotic statements we assume that n — )• oo 
with understanding that p depends on n and possibly p — )• oo as n — )• 
oo. Constants c, C, ci, Ci, C2, C2, . . . are understood to be independent of n. 
Throughout the paper, E„[-] denotes the average over index 1 ^ z ^ n, i.e., it 
simply abbreviates the notation Y17=ii']- E.g., E„[a;?j] = n~^'^^^ixfj. 
In addition, E[-] = E„[E[-]]. For example, E[x|] = n''^ j2'i=iHx'ij]- ^ov a 
function / : M — )■ M, we write d^f{x) = f {x) / dx^ for nonnegative integer 
k; for a function / : M'f — )• M, we write djf{x) = df{x)/dxj for j = 1, . . . ,p, 
where x = {xi, . . . , Xp)'. Denote by C^(R) the class of k times continuously 
differentiable functions from M to itself, and denote by C^(M) the class of 
all functions / G C'^(M) such that sup^gjg 1 9-^/(2;) | < 00 for j = 0, . . . , /c. 
We write a < 6 if a is smaller than or equal to 6 up to a universal positive 
constant. For a, 6 G M, we write a V 6 = max{a, b}. 

2. Central Limit Theorems for Maxima of Non-Gaussian Sums 

2.1. Comparison Theorems and Non- Asymptotic Gaussian Approx- 
imations. The purpose of this section is to compare and bound the dif- 
ference between the expectations and distribution functions of the non- 
Gaussian to Gaussian maxima: 

Tn := max Xj and Zn := max Y,-, 

where vector X is defined in equation ([1]) and Y in equation ([2]). Here and 
in what follows, without loss of generality, we will assume that {xi)"^^-^ ^'^d 
(?/i)r=i independent. The following envelopes and bounds on moments 
will be used in stating the bounds in Gaussian approximations: 

(8) Si := max {\xij\ + \yij\), := max {^[x%]f/^. 

The problem of comparing distributions of maxima is of intrinsic diffi- 
culty since the maximum function z = (zi, . . . ,Zp)' 1— )• maxi^j^pZj is non- 
differentiable. To circumvent the problem, we use a smooth approximation 
of the maximum function. For z = (zi, . . . , Zp)' G consider the function: 



Fp{z) :=/3-ilo: 




which approximates the maximum function, where /? > is the smoothing 
parameter that controls the level of approximation (we call this function the 
"smooth max function"). Indeed, an elementary calculation shows that for 
all z G W, 

(9) ^ Fb{z) - max z,-. ^ log p. 

This smooth max function arises in the definition of "free energy" in spin 



glasses; see, e.g., [43|1 
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We start with the fohowing "warm-up" theorem that conveys the main 
quaUtative feature of the problem. Here and in what follows, for a smooth 
function <^ : M — )• M, write 

Gk :=sup|9V^)|, k^O. 

Theorem 2.1 (Comparison of Gaussian to Non-Gaussian Maxima, I). For 

every g G (M) and /3 > 0, 

|E[5(F^(X)) - g{Fp{Y))]\ < n-^/\G; + G^H + G,l5'')^Sfi 
and hence 

|E[5(ro) - g{Zo)]\ < n-'/^G^ + Gs/? + Gi(3^)E[Sf] + r'Gi logp. 

Comment 2.1 (Optimizing the bound). The theorem bounds the difference 
between the expectations of smooth functions of maxima. The optimal value 
of the last bound is given by 

min n"^/2(G3 + Ga/S + Gi/3^)E[Sf] + r^Gi logp. 

/3>0 

We postpone choices of /3 to the proofs of subsequent corollaries, leaving 
ourselves more flexibility in optimizing bounds in those corollaries. ■ 

Deriving a bound on the Kolmogorov distance between distributions of 
To and Zq from Theorem 12.11 is not a trivial issue and this step relies on 
the following anti- concentration inequality for maxima of Gaussian random 
variables, which is derived in [l^. 

Lemma 2.1 (Anti-Concentration). Let Ci, • • • be (not necessarily inde- 
pendent) centered Gaussian random variables with a| := E[^J] > for all 
1 ^ J ^ P- Let a = mini^j^p cjj and a = maxi^j^p cjj . Then for every 
? > 0, 

supP ( I max — z| ^ ? I ^ G<fY^l V log(p/q'), 

where G > is a constant depending only on a and a. When aj are all 
equal, \og{p/<^) on the right side can be replaced by logp. 

By Theorem 12.11 and Lemma 12.11 we can now derive a bound on the 
Kolmogorov distance between distributions of Tq and Zq. 

Corollary 2.1 (Central Limit Theorem, I). Suppose that there are some 
constants ci > and Gi > such that c\ ^ E[x^^] ^ G\ for all 1 ^ j ^ p. 
Then there exists a constant G > depending only on ci and Gi such that 

p := sup |P(To ^t)- F{Zo ^ t)K C{n-Hlog{pn)yf/\E[Sf])'/\ 

teR 

Comment 2.2 (Main qualitative feature: logarithmic dependence on p). 
Theorem 12.11 and Corollary 12.11 imply that the error of approximating the 
maximum coordinate in the sum of independent random vectors by its 
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Gaussian analogue depends on p (possibly) only through log p. This is the 
main qualitative feature of all the results in this paper. Note also that the 
term E[5f] implicitly encodes the complexity of the vectors, in particular 
it will reflect the correlation structure of vectors X and Y. However, both 
Theorem 12.11 and Corollary 12.11 and all subsequent results given below do 
not limit the dependence among the coordinates in Xj. ■ 

Comment 2.3 (Motivation for the next result). While Theorem 12.11 and 
Corollary 12.11 convey an important qualitative aspect of the problem and 
admit easy-to-grasp proofs, an important disadvantage of these results is 
that the bounds depend on E[Sf]. If E[Sf] ^ C, Corollary O leads to 
p = 0{{n^^ {log{pn)yy/^) and p — )• as long as logp = o(n^/'^). This is the 
case when, for example, as in caption to Figure 1, 

stochastic with \zij\ ^ C, E[|ei|^] ^ C. 

When E[5?] increases with n, however, the bounds need not be as good, and 
can be improved considerably by using a truncation method. Using such a 
method in conjunction with the proof strategy of Theorem 12. H we derive 
in Theorem 12.21 below a bound that can be much better in the latter sce- 
nario. The improvement here comes at a cost of a more involved statement, 
involving truncation parameters. ■ 

To derive our next main result, we employ a truncation method. Given a 
threshold level u > 0, define a truncated version of Xij by 

(10) iij = Xijli^\xij\ ^ 'u(E[x2.])V2| _ E ^xijl i^\xij\ ^ n(E[x2.])i/2| 

Let <Pxiu) be the infimum, which is attained, over all numbers (p ^ such 
that 

(11) E [x^l [\xi,\ > n(E[x2.])i/2|] ^ ^^E[xl]. 

Note that the function <pxiu) is right-continuous; it measures the impact of 
truncation on second moments. Define Ux{'y) as the infimum u ^ such 
that 

P (\xij\ ^ u{E[xl]f/^, 1 ^ i ^ n, 1 ^ i ^ ^ 1 - 7. 

Also define fy{u) and ^^(7) by the corresponding quantities for the ana- 
logue Gaussian case, namely with {xi)f^i replaced by (yj)"=i in the above 
definitions. Throughout the paper we use the following quantities: 

ip{u) := (pxiu) V ^Pyiu), n(7) := Uxi^f) V Uy{'-/). 

Here is the main theorem of this section. Recall the definition of in 

Theorem 2.2 (Comparison of Gaussian to Non-Gaussian Maxima, II). Let 
/? > 0,n > and 7 G (0, 1) be such that 2y/2uM2(3 / y/n ^ 1 and u ^ u{'~i). 
Then for every g £ C^(R), 
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and hence 

\E[g{To) - g{Zo)]\ < Dn{g, /?, u, 7) + p-'d logp, 

where 

Dn{g,P,u,j) := n-V2(G3 + G2/3 + Gi/32)M| + {G2 + PGi)mI^{u) 



+ GiM2^(n)Vlog(p/7) + Go7- 
By Theorem 12.21 and Lemma 12. H we can obtain a bound on the Kol- 
mogorov distance between the distribution functions of Tq and Zq. 

Corollary 2.2 (Central Limit Theorem, II). Suppose that there are 
some constants < ci < Ci such that ci ^ E[x?j] ^ Ci for 1 ^ j ^ p. Then 
for every 7 £ (0, 1), 

ps^C [n-V8(jy^3/4 ^ M]/^)(log(pn/7))7/« + n-^/\log{pn/-f)f/^u{j) + 7] , 
where C > is a constant that depends on ci and Ci only. 

In appUcations it is useful to bound the upper function ^(7). Here is 
a simple and effective way of doing this. Let h : [0, 00) — )• [0, 00) be a 
Young- Orlicz modulus, i.e., a convex and strictly increasing function with 
/i(0) = 0. Denote by /i"^ the inverse function of h. Standard examples 
include the power function h{v) = v'^ with inverse = ^Va and the 

exponential function h{v) = exp(t;) — 1 with inverse h~'^{'^) = log(7 + 1). 
These functions describe how many moments the random variables have, for 
example, a random variable ^ has finite g-th moment if E[|^|''] < 00, and 
is sub-exponential if E[exp(|^|/C)] < <x for some C > 0. We refer to ^44], 
Chapter 2.2, for further details on Young-Orlicz moduli. 

Lemma 2.2 (Bounds on the upper function ^(7)). Let h : [0, 00) — )• [0, 00) 
he a Young-Orlicz modulus, and let B > and D > be constants such that 
(E[x2.])i/2 ^ S /or a// 1 ^ i ^ n, 1 ^ j ^ and E[h{maxi^j^p \xij\/D)] ^ 
1. Then under the condition of Corollaru \2.SX 



m(7) ^ C max{L>/i^^(n/7), i?A/log(pn/7)}, 

where C > is a constant that depends on ci and Ci only. 

In applications, parameters B and D (with and M4 as well) are allowed 
to increase with n. The size of these parameters and the choice of the Young- 
Orlicz modulus are case-specific. 

2.2. Examples of Applications. The purpose of this subsection is to ob- 
tain bounds on p for various leading examples frequently encountered in 
applications. We are concerned with simple conditions under which p de- 
cays polynomially in n. 

Let ci > 0, C2 > 0, Ci > be some constants, and let i?„ ^ 1 be a 
sequence of constants. We allow for the case where i?„ — ?• 00 as n — ?■ 00. 
We shall first consider applications where one of the following conditions is 
satisfied uniformly in 1 ^ i ^ n and 1 ^ j ^ p: 
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(E.l) E[xfj] ^ ci andE[Sf] ^ Ci; 

(E.2) E[x2.] ^ ci and E[eM\xij\/Ci)] ^ 2; 

(E.3) ci ^ E[x|] ^ Ci and jxijl ^ 

Comment 2.4. Condition (E.l) is perhaps the simplest example in this pa- 
per; under this condition application of Corollary 12. II is effective. A concrete 
example with condition (E.l) satisfied is the case where Xij = ZijSi, Zij are 
non-stochastic with \zij\ ^ C, and E[|ej|^] ^ C. Conditions (E.2)-(E.5) are 
more elaborate, intended to cover cases where moments of the envelopes Si 
and higher order moments M3 and -/Vf4 increase with n. In these cases the 
use of Corollary 12.11 is not effective, and we shall use Corollary 12.21 instead. 
Condition (E.2) covers vectors Xj made up from sub-exponential random 
variables, including sub-Gaussian as a special case; this example is quite 
often used in high-dimensional statistics. Condition (E.3) covers variables 
that are bounded by which may increase with n; many applications, 
after a suitable truncation, can be covered by it. ■ 

We shall also consider regression applications where one of the following 
conditions is satisfied uniformly in 1 ^ i ^ n and 1 ^ j ^ p: 

(E.4) Xij = ZijEij, where Zij are non-stochastic with \zij\ ^ Bn, Kn[zfj] = 
1, and E[eij] = 0, E[sjj] ^ ci, and E[exp{\£ij\/Ci)] ^ 2; or 

(E.5) Xij = ZijEij, where Zij are non-stochastic with \zij\ ^ En[•^^^,■] = 
1, and E[eij] = 0, E[ejj] ^ ci, and E[maxi^j^p4i] ^ C'l. 

Comment 2.5. The last two cases cover examples that arise in high- 
dimensional regression, e.g., [lH, which we shall revisit later in the paper. 
Typically, Sij are independent of j (i.e., Eij = Ei) and hence E[maxi^j<gpe|^] ^ 
Ci in condition (E.5) reduces to E[e^] ^ Ci (we allow Eij dependent on j so 
that Corollary 12.31 covers the multiple hypothesis testing example in Appen- 
dix Oj. Interestingly, these examples are also connected to spin glasses, see 
e.g.rB and ^ ( be interpreted as generalized products of "spins" 

and Ei as their random "interactions"). ■ 

Corollary 2.3 (Central Limit Theorem in Leading Examples). Sup- 
pose that one of the following conditions is satisfied: (i) condition (E.l) and 
{\og{pn)y /n ^ Cin"'^^; (ii) condition (E.2) and (log(pn))'^/n ^ Cin~^'^; 
(Hi) condition (E.3) and B^{log{pn)y / n ^ Cin~'^^; (vi) condition (E.4) 
and B^{log{pn)y /n ^ Cin~'^^ ; or (v) condition (E.5) and B^{\og{pn)y /n ^ 
Cin~^'^ . Then there exist constants c > and C > depending only on ci, C2 
and Ci such that 

p ^ Cn-". 

Comment 2.6. Cases (ii)-(v) indeed follow relatively directly from Corol- 
lary 12.21 with help of Lemma 12.21 Moreover, from Lemma 12.21 it is routine 
to find other conditions that lead to the conclusion of Corollary 12.31 ■ 
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3. Multiplier Bootstrap 

3.1. A Gaussian-to-Gaussian Comparison Theorem. The proofs of 
the main resuhs in this section rely on the following lemma. Let V and Y 
be centered Gaussian random vectors in W with covariance matrices and 
S^, respectively. The following lemma compares the distribution functions 
of maxi^j^p Vj-andmaxi^jsgp Yj in terms of p and 

Ao := max IsK — tXA . 

Lemma 3.1 (Comparison of Distributions of Gaussian Maxima). Suppose 
that there are some constants < ci < Ci such that ci ^ T,Jj ^ Ci for all 
1 ^ i ^ P- Then there exists a constant C > depending only on ci and Ci 
such that 



sup 



P f max ^ t 1 - P f max Y, ^ t 



<CAy^lVlog(p/Ao))2/3. 



Comment 3.1. The result is derived in [I8l |. and extends that of [12| | who 
gave an explicit error in Sudakov-Fernique comparison of expecations of 
maxima of Gaussian vectors. ■ 

3.2. Multiplier Bootstrap Theorems. Suppose that we have a dataset 
(^i)F=i consisting of n independent centered random vectors Xj in M^. In 
this section we are interested in approximating quantiles of 



1 " 

(12) To = max ^ y 



n 

Xi 



1=1 



using the multiplier bootstrap method. Specifically, let {ei)f^i be a sequence 
of i.i.d. A^(0, 1) variables independent of {xi)f^i, and let 



1 " 

(13) Wo = max — V 



n 

X 2 T . 

i=\ 

Then we define the multiplier bootstrap estimator of the a-quantile of Tq as 
the conditional a-quantile of given (xj)"^]^, i.e., 

CH/„(a) := inf{t G M : Pe(Wo ^ *) ^ a}, 

where Pe is the probability measure induced by the multiplier variables 
(ei)F=i holding (xi)r=i fixed (i.e., V,{Wo ^ t) = P(t^o ^ t \ (x^Li))- The 
multiplier bootstrap theorem below provides a non-asymptotic bound on the 
bootstrap estimation error: 

|P(ro ^ cvKo(a)) - a\ . 

Before presenting the theorem, we first give a simple useful lemma that 
is helpful in the proof of the theorem and in power analysis in applications. 
Define 

czo(a) := mi{t G R : P(Zo ^ t) ^ a}, 
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where Zq = maxi^jsgp J27=i Vij/V^ iyi)f=i is a sequence of independent 
N(0,E[xix[]) vectors. Recall that 

A = max |E„[xija:jfc] - E[xijj;jfc]| • 

Lemma 3.2 (Comparison of Quantiles, I). Suppose that there are some 
constants < ci < Ci such that c\ ^ E[x?j] ^ C\ for all 1 ^ j ^ p. Then 
for every a G (0, 1), 

F{cwo{a) ^ czoia + 7r(7?))) ^ 1 - P(A > 

P(czo(a) ^ cwoia + ^(??))) ^ 1 - P(A > 

where, for C2 > denoting a constant depending only on c\ and Ci, 

7r(T9) := C2^^/^{1 V log(p/i?))2/3. 

Recall that p := supjg]^ |P(ro ^t) — P{Zq ^t)\. We are now in position 
to state the main theorem of this section. 

Theorem 3.1 (Validity of Multiplier Bootstrap, I). Suppose that for 
some constants < ci < Ci, we have ci ^ E[x?j] ^ Ci for all 1 ^ j ^ p. 
Then for any 1) > 0, 

sup |P(To ^ cwo{a))-a\ ^ p + 7r{^)+P{A > 
ae(o,i) 

Theorem 13.11 provides a useful result for the case where the statistics are 
maxima of exact averages. There are many applications, however, where 
the relevant statistics arise as maxima of approximate averages. The follow- 
ing result shows that the theorem continues to apply if the approximation 
error of the relevant statistic by a maximum of an exact average can be 
suitably controlled. Specifically, suppose that a statistic of interest, say 
T = T{xi . . . , Xn) which may not be of the form (jl2p . can be approximated 
by Tq of the form (jl2p . and that the multiplier bootstrap is performed on 
a statistic W = W{xi, . . . , Xn, ei, . . . , e„), which may be different from (fT3]) 
but still can be approximated by Wq of the form ()13p . 

We require the approximation to hold in the following sense: there exist 
Ci ^ and C2 ^ 0, depending on n (and typically Ci — ^ 0, C2 — as n — )• 00), 
such that 

(14) p(|T-ro| >Ci) <C2, 

(15) P(Pe(|VF-T^o| >Cl) >C2) <C2. 

We use the a-quantile oi W = W{xi, . . . , x„, ei, . . . , e^,), computed condi- 
tional on (xj)"^;^: 

cw{a) := mf{t G R : Pe{W ^ t) ^ q}, 



as an estimate of the a-quantile of T. 
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Lemma 3.3 (Comparison of Quantiles, II). Suppose that condition p5\) is 
satisfied. Then for every a G (0, 1), 

P(ciy(a) ^ cwoia + C2) + Ci) ^ 1 - C2, 

F{cwo{a) ^ cw{a + C2) + Ci) ^ I - C2. 

The next result provides a bound on the bootstrap estimation error. 

Theorem 3.2 (Validity of Multiplier Bootstrap, II). Suppose that, for 
some constants < ci < Ci, we have c\ ^ E[a;?j] ^ C\ for all 1 ^ j ^ p. 
Moreover, suppose that conditions ((i^[ ) and ^5\) are satisfied. Then for any 
i9 > 0, 

sup |P(r ^ cw{a)) - aK P+vr(??)+P(A > i?)+C3Ci y^l V log(p/Ci)+C2, 

aG(0,l) 

where 7r(-) is defined in Lemma \3.SX and C3 > depends only on ci and Ci. 

3.3. Examples of Applications: Revisited. Here we revisit the exam- 
ples in Section 12.21 and see how the multiplier bootstrap works for these 
leading examples. Let, as before, ci > 0, C2 > and Ci > be some 
constants, and let Bn ^ 1 be a sequence of constants. Recall conditions 
(E.2)-(E.5) in Section EH 

Corollary 3.1 (Multiplier Bootstrap in Leading Examples). Suppose 
that conditions ( [7^ and [75j) hold with Ci\/logP + C2 ^ Cin~'^^ . Moreover, 
suppose that one of the following conditions is satisfied: (i) condition (E.2) 
and {log{pn)y /n ^ Cin"'^^; (ii) condition (E.3), and B^{log{pn)y / n ^ 
Cin~'^'^ ; (Hi) condition (E.4) and i?^(log(pn))'^/n ^ Cin~'^^ ; or (iv) condi- 
tion (E.5) and S^(log(pn))'^/n ^ Cin~^'^ . Then there exist constants c > 
and C > depending only on ci , C2 and Ci such that 

sup |P(r ^ cw{a)) - aK Cn'". 

ae(0,l) 

Comment 3.2. This corollary shows that the multiplier bootstrap is valid 
with a polynomial rate of accuracy for the significance level under weak con- 
ditions. This is in contrast with the extremal theory of Gaussian processes 
that provides only a logarithmic rate of approximation (see, e.g., [s^] and 

[17 



4. Application: Dantzig Selector in the Non-Gaussian Model 

The purpose of this section is to demonstrate the case with which the CLT 
and the multiplier bootstrap theorem given in Corollaries 12.31 and 13.11 can be 
applied in important problems, dealing with a high-dimensional inference 
and estimation. We consider the Dantzig selector previously studied in the 



path-breaking works of [Ul , [3] , 45|] in the Gaussian setting and of [3]| in a 
sub-exponential setting. Here we consider the non-Gaussian case, where the 
errors have only four bounded moments, and derive the performance bounds 
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that are approximately as sharp as in the Gaussian model. We consider both 
homoscedastic and heteroscedastic models. 

4.1. Homoscedastic case. Let be a sample of independent ob- 

servations where Zi G is a non-stochastic vector of regressors. We consider 
the model 

Vi = z'iP + E[ei] =0, i = 1, . . . , n, En[zfj] = 1, j = 1, . . . ,p, 

where yi is a random scalar dependent variable, and the regressors are nor- 
malized in such a way that E„[4] = 1- Here we consider the homoscedastic 
case: 

E[e]] = a^ i = l,...,n, 

where is assumed to be known (for simplicity). We allow p to be substan- 
tially larger than n. It is well known that a condition that gives a good per- 
formance for the Dantzig selector is that P is sparse, namely ||;9||o ^ s ^ n 
(although this assumption will not be invoked below explicitly). 

The aim is to estimate the vector f3 in some semi-norms of interest: || • ||/. 
For example, given an estimator /3 the prediction semi-norm for 6 = 13 — (3 
is 

or the j-ih component seminorm for S is 

ll% = l'^.i 

and so on. The label / designates the name of a norm of interest. 
The Dantzig selector is the estimator defined by 

(16) P G arg mm ||6||^^ subject to ^/n msx^ \En[zij{yi - z-b)] \ ^ A, 

where = Z^j=i is the £i-norm. An ideal choice of the penalty level 

A is meant to ensure that 

To ■— \/n max |E„[zi,£i]| < A 

with a prescribed probability 1 — a. Hence we would like to set penalty level 
A equal to 

C7|)(l — a) := (1 — a)-quantile of Tq, 

(note that Zi are treated as fixed). Indeed, this penalty would take into 
account the correlation amongst the regressors, thereby adapting the per- 
formance of the estimator to the design condition. We can approximate this 
quantity using the central limit theorems derived in Section 2. Specifically, 
let 

Zq := a\fn max |E„[2;i,-ej]|, 

where Cj are i.i.d. Af(0, 1) random variables independent of the data. We 
then estimate cT(j(1 — a) by 

czo(l ~ ck) ■= (1 ~ Q;)-quantile of Zq. 



CLT AND MULTIPLIER BOOTSTRAP WHEN p IS MUCH LARGER THAN n 15 



Note that we can calculate czq (1— «) numerically with any specified precision 
by the simulation. (In a Gaussian model, design-adaptive penalty level 
czo{l — a) was proposed in [H], but its extension to non-Gaussian cases was 
not available up to now). 

An alternative choice of the penalty level is given by 

coil -a) ■.= a'l>-\l-a/{2p)), 

which is the canonical choice; see [llj and 0]. Note that canonical choice 
co(l — a) disregards the correlation amongst the regressors, and is therefore 
more conservative than C2p(l — a). Indeed, by the union bound, we see that 

czo(l - a) ^ co(l - a). 

Our first result below shows that the either of the two penalty choices, 
A = c^o(l — a) or A = co(l — a), are approximately valid under non- 
Gaussian noise-under the mild moment assumption 'E[ef] ^ const, replacing 
the canonical Gaussian noise assumption. To derive this result we apply our 
CLT to To to establish that the difference between distribution functions 
of To and Zq approaches zero at polynomial speed. Indeed Tq can be rep- 
resented as a maximum of averages, Tq = Toaaxi<^k^2pn~^^'^ Y17=i ^ik^i^ 
Zi = (z^, —z'j)', and therefore our CLT applies. 

To derive the bound on estimation error \\S\\i in a seminorm of interest, 
we employ the following identifiability factor: 

:= inf I max : 6 G W), \\6\\j + o| , 

where 7^(/3) := {5 G : + b\\t^ < ||/3||^J is the restricted set; fi;/(/3) is 
defined as oo if 7^(/3) = {0} (this happens if /3 = 0). The factors summarize 
the impact of sparsity of true parameter value /3 and the design on the 
identifiability of /? with respect to the norm || • ||/. 

Comment 4.1 (A comment on the identifiability factor k/ (/?)). The identi- 
fiability factors K/(/3) depend on the true parameter value /3. This is not the 
main focus of this section, but we note that these factors represent a modest 
generalization of the cone invertibility factors and sensitivity characteristics 
defined in [45i] and [13], which are known to be quite general. The main dif- 
ference perhaps is the use of a norm of interest || • ||/ instead of the Iq norms 
and the use of smaller (non-conic) restricted set 7^(/3) in the definition. It 
is useful to note for later comparisons that in the case of prediction norm 
II ■ 11/ = II ■ llpr aJ^d under the exact sparsity assumption ||/3||o ^ s, we have 

(17) Kp,(/3) ^2-1s-i/2k(s,1), 

where k(s, 1) is the restricted eigenvalue defined in [sl]. ■ 

Next we state bounds on the estimation error for the Dantzig selector fi'^^^ 
with canonical penalty level A = A^*^) := co(l — a) and the Dantzig selector 
^5^^) with design-adaptive penalty level A = A*^^-' := czo(l — «)• 
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Theorem 4.1 (Performance of Dantzig Selector in Non-Gaussian Model). 
Suppose that there are some constants c\ > 0, Ci > and > 0, and 
a sequence ^ 1 of constants such that for all 1 ^ i ^ n and 1 ^ 
j ^ p: (i) \zi,\ ^ Bn, (ii) E„[4.] = 1; (in) E[e2] = ; (iv) ^[ef] ^ Ci; 
and (v) B^{log{pn)y / n ^ Cin~'^^ . Then there exist constants c > and 
C > depending only on ci,Ci and cj^ such that, with probability at least 
1 — a — Cn~'^, for either k = or 1, 



2AW 



The most important feature of this result is that it provides Gaussian- 
like conclusions (as explained below) in a model with non-Gaussian noise, 
having only four bounded moments. However, the probabilistic guarantee 
is not 1 — a as, e.g., in [§], but rather 1 — a — Cn~^, which reflects the cost 
of non-Gaussianity (along with more stringent side conditions). In what 
follows we discuss details of this result. Note that the bound above holds 
for any semi- norm of interest || • ||/. 

Comment 4.2 (Improved Performance from Design- Adaptive Penalty Level). 
The use of the design-adaptive penalty level implies a better performance 
guarantee for (3^^^ over ^(0). Indeed, we have 

2czo(l-a) ^ 2co(l - a) 
y/nKi{l3) ^ ^/nni{/3) 

E.g., in some designs, we can have -v/nmaxi^j^p |E„[zjjej]| = Op(l), so that 
czo{l — a) = 0(1), whereas co(l — a) oc ^logp. Thus, the performance 
guarantee provided by jS^^"^ can be much better than that of f3^'^\ ■ 

Comment 4.3 (Relation to the previous results under Gaussianity) . To 
compare to the previous results obtained for the Gaussian settings, let us 
focus on the prediction norm and on estimator /3^^^ with penalty level A = 
czg{l — a). Suppose that the true value /3 is sparse, namely ||/3||o ^ s. In 
this case, with probability at least 1 — a — Cn~^, 



(18) ||?« - ffll, < %<1-^ < VM^oO ^ 4V^V21og(./(2p)) ^ 

^/nKpr[p) \/nK[s, 1) ^Jnn^s, 1) 

where the last bound is the same as in jH, Theorem 7.1, obtained for the 
Gaussian case. We recover the same (or tighter) upper bound without mak- 
ing the Gaussianity assumption on the errors. However, the probabilistic 
guarantee is not 1 — a as in [Sj], but rather 1 — a — Cn~'^, which together 
with side conditions is the cost of non-Gaussianity. ■ 

Comment 4.4 (Other refinements). Unrelated to the main theme of this 
paper, we can see from (llSp that there is some tightening of the performance 
bound due to the use of the identifiability factor Kpr(/3) in place of the 
restricted eigenvalue k{s, 1); for example, if p = 2 and s = 1 and the two 
regressors are identical, then (/3) > 0, whereas k(1, 1) = 0. There is also 



CLT AND MULTIPLIER BOOTSTRAP WHEN p IS MUCH LARGER THAN n 17 



some tightening due to the use of czo{l — a) instead of co(l — a) as penalty 
level, as mentioned above. ■ 

4.2. Heteroscedastic case. We consider the same model as above, except 
now the assumption on the error becomes 

:=E[e?] ^ct2, i = l,...,n, 

i.e., o"^ is the upper bound on the conditional variance, and we assume that 
this bound is known (for simplicity). As before, ideally we would like to set 
penalty level A equal to 

cto(1 — a) := (1 — a)-quantile of Tq, 

(where Tq is defined above, and we note that Zi are treated as fixed). The 
CLT applies as before, namely the difference of the distribution functions 
of Tq and its Gaussian analogue Zq converges to zero. In this case, the 
Gaussian analogue can be represented as 

Zq := ^/n max |E„[zjoCrjei]|. 

Unlike in the homoscedastic case, the covariance structure is no longer 
known, since are unknown and we can no longer calculate the quan- 
tiles of Zq. However, we can estimate them using the following multiplier 
bootstrap procedure. 

First, we estimate the residuals Si = yi — z0^^'^ obtained from a prelim- 
inary Dantzig selector l3^^^ with the conservative penalty level A = A^*^) := 
Co(l — 1/n) := C7<I>~^(1 — l/(2pn)), where cj^ is the upper bound on the error 
variance assumed to be known. Let (e,)"^^ be a sequence of i.i.d. standard 
Gaussian random variables, and let 

W := \fn max |E„[2:j,-ejeil|. 

Then we estimate C2q(1 — q) by 

cvk(1 — ct) := (1 — a)-quantile of Vl^, 

defined conditional on data (^i, yj)r=i- Note that cvk(1 — a) can be calculated 
numerically with any specified precision by the simulation. Then we apply 
program with A = A*^^^ = cvi/(l — a) to obtain fi^'^^ . 

Theorem 4.2 (Performance of Dantzig in Non-Gaussian Model with Boot- 
strap Penalty Level). Suppose that there are some constants ci > 0, Ci > 
0,CT^ > and cr^ > 0, and a sequence ^ 1 o/ constants such that for 
all 1 ^ i ^ n and 1 ^ j ^ p: (i) \zij\ ^ (ii) IEr^[-^J^,•] = 1; (Hi) 

^ ^ E[e2] ^ (iv) E[e|] ^ Ci; (v) B^{\og{pn)y /n ^ Cin-^^ ; and (vi) 
(logp)i?„co(l— l/n)/(-^n«;pr(/3)) ^ Cin''^^ . Then there exist constants c > 
and C > depending only on ci,Ci,a^ and cj^ such that, with probability at 
least 1 — a — Vn where = Cn~'^, we have 
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Moreover, with probability at least 1 — Vn, 

X^^"^ = cwi'^ - a) ^ c^o(l - a + Vn 

where czq{1 — a):= (1 — a)-quantile of Zq; in particular czq(1 — a) ^ co(l — 
a). 

4.3. Some Extensions. Here we comment on some additional potential 
applications. 

Comment 4.5 (Confidence Sets). Note that bounds given in the preceding 
theorems can be used for inference on /3 or components of f3, given the 
assumption ni{j3) ^ n, where k is a known constant. For example, consider 
inference on the j-th component j3j of (3. In this case, we take the norm 
of interest to be ||(^||jc = on R^, and consider the corresponding 
identifiability factor Kjc(/3). Suppose it is known that Kjc(/3) ^ k. Then a 
(1 — a — Cn~'^)-confidence interval for (3j is given by 

{6 G M : - 6K 2AW/(\^'«)}- 

This confidence set is of interest, but it does require the investigator to make 
a stance on what a plausible k should be. We refer to [13] for a justification 
of confidence sets of this type and possible ways of computing lower bounds 
on K\ there is also a work by |29^, which provides computable lower bounds 
on related quantities. ■ 

Comment 4.6 (Generalization of Dantzig Selector). There are many in- 
teresting applications where the resultsgiven above apply. There are, for 
example, interesting works by [l| and [l^] that consider related estimators 
that minimize a convex penalty subject to the multiresolution screening 
constraints. In the context of the regression problem studied above, such 
estimators may be defined as: 

/? E argmin J(6) subject to ^/n max \^n[zij{vi — z[h)\\ ^ A, 

where J is a convex penalty, and the constraint is used for multiresolu- 
tion screening. For example, the Lasso estimator is nested by the above 
formulation by using J{b) = \\b\\pr, and the previous Dantzig selector by 
using J{b) = \\b\\i-^^■, the estimators can be interpreted as a point in con- 
fidence set for /3, which lies closest to zero under J-discrepancy (see ref- 
erences above for both of these points). Our results on choosing A apply 
to this class of estimators, and the previous analysis also applies by re- 
defining the identifiability factor k/(/3) relative to the new restricted set 
7^(/3) := {5 G RP : J(/3 + 5) ^ J(/3)}; where mi/3) is defined as oo if 
7^(/3) = {0}. . 

Appendix A. Preliminaries 



A.l. A Useful Maximal Inequality. The following lemma, which is de- 
rived in 181], is a useful variation of standard maximal inequalities. 
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Lemma A.l (Maximal Inequality). Let xi, . . . he independent random 
vectors in with p ^ 2. Let M = maxi^j^ri niaxi<gj^p and cr^ = 
maxi^j^pE[a;? ]. Then 



max |E„[xiJ - E[xi,-]| 



< aV(logp)/n+ VE[M2](logp)/ 



n. 



E 

Proof. See [1^, Lemma 8. ■ 

A. 2. Properties of the Smooth Max Function. We will use the follow- 
ing properties of the smooth max function. 

Lemma A. 2 (Properties of Fp). For every 1 ^ j,k,l ^ p, 

djF/siz) = TTjiz), djdkFfsiz) = (3wjkiz), djdkdiFp{z) = p'^q^kiiz). 
where, for 6jk := l{j = k}, 

TTj{z) := e^^VEm=ie^^'"' Wjk{z) := {TTjSjk - vrj^fc)(z), 
qjkl{z) ■■= {T^j^jl^jk - T^jT^lSjk - T^jT^ki^jl + ^kl) + 2vrj7rfe7ri)(z). 
Moreover, 

Proof of Lemma \A.^ The first property was noted in [12] . The other prop- 
erties follow from repeated application of the chain rule. ■ 

Lemma A. 3 (Lipschitz Property of Fjj). For every x G and z G M^, we 

have \Fi3{x) — Fj3{z)\ ^ maxi^j^p \xj — Zj\. 

Proof of Lemma \A.3l For some t G [0, 1] , 
|F^(x) - F^iz)\ = lEU^jFfsix + t{z - x)){z, - x,)\ 



^ X]?=i'^i(^ + Kz ~ x)) max \zj — Xj\ ^ max \zj 



where the property X]j=i T^jix + t{z — x)) = 1 was used. ■ 

We will also use the following properties of m = g o Fp. Here we assume 
g £ Cf(M) in Lemmas [A41IA.6I below. 

Lemma A. 4 (Three derivatives oi m = g o Fp). For every 1 ^ j,k,l ^ p, 

djm{z) = {dg{Fp)TTj){z), 

djdkm{z) = {d^g{Fj3)7rjTTk + dg{Fp)(3wjk){z), 

djdkdim{z) = {d^g{Fi3)-KjiTkiTi + d'^g{Fi3)(3{wjkiTi + wji-Kk + WkiiTj) 

+ dg{Fp)p\jki){z), 

where Hj, Wjk andqjki are defined in Lemma lA.^ and (z) denotes evaluation 
at z, including evaluation of Fp at z. 

Proof of lemma \A.4\ The proof follows from repeated application of the 
chain rule and by the properties noted in Lemma |A.2[ ■ 
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Lemma A. 5 (Bounds on derivatives oi m = go Fp). For every 1 ^ j,k,l ^ 
P, 

\djdkm{z)\ ^ Ujk{z), \djdkdim{z)\ ^ Ujkiiz), 

where 

Ujk{z) := {G2'KjTik + Gil3Wjk){z), Wjk{z) := {iTj6jk + 7rjnk){z), 



Ujkiiz) := [G^TTjTikn + G2(i{Wjkn + Wji^k + Wkinj) + GiP'Qjki)i 



z 



Qjkl{z) := (.T^jSjl^jk + TTjVTiJjfc + TTjTTkiSjl + 6kl) + 27rjTrkTTl){z) . 

Moreover, 

Elk=iUjk{z) ^ {G2 + 2Gi/3), Elk,i=iUM^) ^ (Cs + 6G2/3 + 6G1/32). 

Proof of Lemma 1^.51 The lemma follows from a direct calculation. ■ 

Lemma A. 6 (Stability). For every z G MP, w gMP such thatmaxj^p \ wj\/3 ^ 
1, T G [0, 1], and every 1 ^ j,k,l ^ p, we have 

Ujk{z) < Ujk{z + TW) < Ujk{z), Ujkiiz) < Ujkiiz + TW) < Ujkiiz). 

Proof of Lemma \A.6l Observe that 

TTjiz + Tw) = 3- ^ ^ ^ • 1 ^ e^7r,-(z). 

Similarly, iTjiz + tw) ^ e^'^iTjiz). Since Ujk and Ujki are finite sums of 
products of terms such as tTj, iTk, tt/, 6jk, the claim of the lemma follows. ■ 

A. 3. Lemma on Truncation. The proof of Theorem 1 2 . 2 1 uses the following 
properties of the truncation operation. Recall that Xi = (xjj)^^^ and X = 
^-1/2 Y^^^^Xi, where "tilde" denotes the truncation operation defined in 
Section 2. The following lemma also covers the special case where ixi)f^i = 
(dli)2=i- '^^^ property (d) is a consequence of sub-Gaussian inequality of 
[19| . Theorem 2.16. for self-normalized sums. 

Lemma A. 7 (Truncation Impact). For every 1 ^ j,k ^ p and q ^ 1, 

(a) iE[\xij\i])yi ^ 2(E[|xi,f])V9; (b) E[\xi,Xik - XijXik\] ^ (3/2)(E[4] + 
Mx^kD'Piu); (c)En[iE[xij-iij])^] ^ E[ixij-Xij)^] ^ E[xfj]ip^iu). Moreover, 
for a given 7 G (0, 1), let u ^ uij) where uij) is defined in Section\^ Then: 
(d) with probability at least 1 — 57, for all 1 ^ j ^ p, 



\X, - 5yi[4]^(n)V21og(p/7). 



Proof. See Appendix [Fl 
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Appendix B. Proofs for Section 2 

B.l. Proof of Theorem 12.11 Recall that we are assuming that sequences 
(xj)"^]^ and are independent. For t G [0,1], we consider the Slepian 

interpolation between Y and X: 

" I 
Z{t) := ViX + VT^tY = V Zi{t), Zi{t) := -^{Vtxi + Vl^tyi). 

We shall also employ Stein's leave-one-out expansions: 

:= {Z,,{t))%, ■.= Z{t)-Z,{t). 
Let ^(t) = E[m(Z(t))] for m := g o Ff^. Then by Taylor's theorem, 



E[m(X) - m{Y)] = ^{l) - ^'(0) 

p n „i 
10 

where 



^'{t)dt 

oEE / E[a,m(Z(t))4(i)]di = -(/ + // + ///), 



P n „i 



Uij , and 



, = 1 ^=l 

II=Y.Y. ndidkm{Z^^{t))Z,,{t)Z,k{t)]dt, 

(1 - T)Y.[d,dkdim{Z'^^{t) + TZ,{t))Z,j{t)Z,k{t)Za{t)]dTdt. 



j,k,l = l 1=1 



Jo 



Note that random variable Z^'^\t) and random vector {Zij{t), Zij{t)) are 
independent, and E[Zjj(t)] = 0. Hence we have / = 0; moreover, since 
E[Zij{t)Zikit)] = n~^E[xijXik - UijUik] = by construction of we 
also have // = 0. Consider the third term ///. We have that 



l^^^l <(i) (Ga + G2/3 + Gi/32)n J E 
<{2) n-'/\G3 + G2/3 + Gi(3^)E 



max \Zij{t)Zik{t)Zu{t)\ 

l^j,k,l^p 



max {\xij\ + luijl 



dt, 



where (1) follows from \djdkdim{Z(^ (t) + TZi{t))\ ^ UjkiiZ^'Kt) + TZi{t)) < 
(G3 + + holding by Lemma lA.5t and (2) is shown below. The 

first claim of the theorem now follows. The second claim follows directly 
from property ([9]) of the smooth max function. 
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It remains to show (2). Define u!{t) = l/{y/t /\ yjl — t) and note, 

dt 



1 

nE 







max \Zij{t)Zik{t)Zii{t)\ 

l^j,k,l^p 







1 r 

uj{t)7iE max \Z,j{t)/uj{t))Z,k{t)Zu{t)\ dt 

1 / \ 1/3 



s; n / w(t)fE[max |Zy(i)/w(i)|^]E[ max |Zy(t)|^]E[ max \Z^j{t)f]] dt 
Jo V i^Kp i^i^p i^j^p / 

s; 1 /■ E max {\xij\ + \y,j\) 

[Jo J [l^J^P 



where the first inequahty follows from Holder's inequality, and the second 
from the fact that \Zij{t)/uj{t)\ ^ {\xij\ + \yij\)/y/n, \Zij{t)\ ^ {\xik\ + 
\yik\)/V^- Finally we note that u!{t)dt < 1, so inequality (2) follows. 
This completes the overall proof. ■ 

B.2. Proof of Corollary 12.11 In this proof, let C > denote a generic 
constant depending only on ci and Ci, and its value may change from 
place to place. For /? > 0, define := /3~^logp. Recall that Si := 
maxi<^j<^p{\xij\ + luijl)- Consider and fix a C^-function i^o : K — )• [0, 1] such 
that go{s) = 1 for s ^ and go{s) = for s ^ 1. Fix any t G M, and define 
9{s) = goii^is - t - e/j)). For this function g, Gq = I, Gi < il^, G2 < V'^ 
and G3<tp^. 

Observe now that 

P(ro ^ t) ^ P{Fp{X) ^t + ep)^ E[g{Fp{X))] 

^ E[g{FpiY))] + C(V^3 ^ ^^2 ^ ^2^_^)(^-i/2e[53]) 

^ P(Zo ^ t + + V"^) + + /3^2 + /32^)(n-i/2E[53])^ 

where the first inequality follows from ([9]), the second from construction of 
g, the third from Theorem 12.11 and the fourth from construction of g, and 
the last from Q. The remaining step is to compare P{Zq ^ t + + V'~^) 
with P{Zq ^ t) and this is where Lemma |2. II plavs its role. By Lemma |2. 11 



P(Zo ^t + e^ + ip-^) - P(Zo ^ t) ^ G{e^ + 4^-^) ^ 1 V log{p%^) . 
by which we have 

P(ro t)-P(Zo ^t)^ C[{ij^+fitP^+f3^ij)(n~^^^E[Sf]) + {ep+tp-')^l V log(W^)]. 

We have to minimize the right side with respect to /? and ip. It is reasonable 
to choose (3 in such a way that e^j and are balanced, i.e., /? = •i/'logp. 
With this /?, the bracket on the right side is 

< V'(logp)2(n-V2E[Sf]) + Vl V log(i,^), 
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which is approximately minimized by = (log ^^^{n ^^"^FilSf]) ""^Z^. With 
this ip, ip {n-^/^E[Sf]y^/'^ < Cn^/^ (recall that p ^ 3), and hence 
log(pV) ^ C\og{pn). Therefore, 

P(ro ^ t) - P(Zo ^ t) ^ C{n-'/^E[S!])'/\logipn))y\ 

This gives one half of the claim. The other half follows similarly. ■ 

B.3. Proof of Theorem 12. 2L The second claim of the theorem follows 
from property ([9]) of the smooth max function. Hence we shall prove the first 
claim. The proof strategy is similar to the proof of Theorem 12.11 However, 
to control effectively the third order terms in the leave-one-out expansions 
we shall use truncation and replace X and Y by their truncated versions X 
and Y, defined as follows: let Xi = (xjj)j=i, where defined before 

the statement of the theorem, and define the truncated version of X as 
X = n-V2 ^^^^ Also let 

1 " 

Vi ■■= iyijfj=i, Vij ■■= y^j'i-{\y^j\ ^ u{E[yfj])^/H , y = ^^m- 

Note that by the symmetry of the distribution of yij, E[yij] = 0. Recall that 
we are assuming that sequences (xj)"^^ and (?/i)r=i independent. 

The proof consists of four steps. Step 1 will show that we can replace X 
by X and Y by Y. Step 2 will bound the difference of the expectations of 
the relevant functions of X and Y. This is the main step of the proof. Steps 
3 and 4 will carry out supporting calculations. The steps of the proof will 
also call on various technical lemmas collected in Appendix lAl 

Step 1. Let m := g o F^. The main goal is to bound E[m{X) — m{Y)]. 
Define 

X = 1 < max \Xj — Xj\ ^ A{'y,u) and max \Yj — Yj\ ^ A(7, n) 

where A(7, n) := 5M2(p{u) \J 2 log(p/7) . By Lemma I A. 71 we have E[X] ^ 
1 — IO7. Observe that by Lemma lA.31 

|m(x) - m{y)\ ^ Gx\Fp,{x) - Fp{y)\ ^ d max |xj - yj], 

so that 

|E[m(X) - ra(X)\\ ^ \E{(ra(X) - ■m(X))X\\ + |E[(m(X) - m(X))(l - X)]| 

<GiA(7,^.) + Go7, 
|E[m(y) - m(y)]| ^ E[(m(y) - m(y))X]| + |E[(m(y) - m(y))(l -X)] 

<GiA(7,^^) + Go7, 

and hence 

|E[m(X) - m(y)]| < |E[m(l) - m(y)]| + GiA(7,7x) + G07. 
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Step 2. (Main Step) The purpose of this step is to establish the bound: 
|E[m(X) - m{Y)]\ < ^-^/^(Gg + + Gi(3^)MI + (G2 + f3Gi)Ml^{u). 

Define, as in the proof of Theorem 12.11 

" I 
Z{t) := ViX + VT^Y = y Zi{t), Zi{t) := ^{Viii + VT^tyi), and 

:= Z{t) - Z,{t), Zijit) = (^-^Xi, - ^7=^^.) • 
Arguing as in the proof of Theorem 12.11 we have 

p n »i 

E[miX)-miY)] = -^Yl / E[a,m(Z(t))4(t)]df = -(/ + // + ///), 
j=l i=l -^0 

where 

P n I 

^ = EE / E[a,m(ZW(t))4(t)]dt, 
11= nd,dkm{Z^'Ht))Z,j{t)Z,k{t)]dt, 

j,k=l i=l •'^ 

P n „i „i 

111= / (l-^)E[9j9fec';m(Z«(t)+TZ,(i))Z,,(i)Z,fe(t)Z,,(i)]drdi. 

i,fe,/=l i=l -^o -^0 

By independence of Z^'^\t) and Zij{t) together with the fact that E[Zij{t)] = 
0, we have / = 0. Moreover, in steps 3 and 4 below, we will show that 

l^^l < (G2 + /3Gi)M2V(n), \III\ < n-V2(G3 + G2P + GiP^)Ml 

The claim of this step now follows. 

Step 3. (Bound on //) By independence of Z^'^\t) and Zij{t)Zik{t), 

p n ..I 

\II\ = E E / nd,dkm{Z^'\t))]E[Zij{t)Zikit)]dt 

j,k=l i=l •'^ 
P n I 

^Y.Y. E[|5,-5fcm(zW(t))|].|E[4-(t)Z,fe(t)]|dt 

j,k=l 1=1 "^^ 
P n 

^ EE / E[C/,fc(Z«(t))] •|E[4-(t)Z,fc(t)]|dt, 
j,k=l i=i 

where the last step follows from Lemma IA.5[ Since | \/tXij + — tjjij \ ^ 
2\/2uM2, so that \(3{y/tXij + \/l — tyij)/y/n\ ^ 1 (which is satisfied by the as- 
sumption (i2\f2uM2l \fn ^ 1), by Lemmas IA.6I and \K.h\ the last expression 



CLT AND MULTIPLIER BOOTSTRAP WHEN p IS MUCH LARGER THAN n 25 



is bounded by 



P n ,.i 







j,k=l i=l 

-1 ( P 







j,k=l I i=l 



< 



■^0 i=l 



Observe that since E[a;jja;jfc] = ElyijUik], we have that E[Zjj (t)Zjfc(t)] = 
n"^E[xij:rifc -yijyjfc] = n~^E[xijXik - XijXik] + n'-^E[yijyik - jjijijik], so that 
by Lemma |AJ](b), X]r=i ^ E[\xijXik \] + E[\yijyik- 

|] < {E[x1^] + E[a;2j)^(n) < M^ip{u). Therefore, we conclude that 
\II\<{G2 + Gip)Ml^{u). 

Step 4. (Bound on ///) Observe that 
|I//K(i) E[Ujkl{Z'^'^{t) + rZimZ^j{t)Zik{t)Zumdrdt 



j,k,l=l i=l 



JO 



P n I 
j,fc,«=l i=l 

(20) 

=(3) EE / E[t/,fcK^(*Hi))]-E[|4-(t)Z,fe(t)Z,Kt)|]^it, 
j,fc,«=i i=i 

where (1) follows from \djdkdim{z)\ ^ Ujki{z) (see Lemma lA.Sjl . (2) from 
Lemma lA. 61 (3) from independence of Z«(t) and Zij{t)Zik{t)Zii{t). More- 
over, the last expression is bounded as follows: 

right side of ([20]) <(4) EE / nUM^it))] ■ E[\Z,j{t)Zikit)Zumdt 

j,k,l=l i=l 

=(5) E / E[Ujki{Z{t))]-nE[\Zij{t)Zik{t)Zu{t)\]dt 
j,k,i=i-^' 
-1 / p ^ 



^(6) 



, E ^PM^it))] max nE[\Zi,{t)Zik{t)Zuit)\]dt 

\ „.r7li / i^Ji.'^.'^p 



<(7) (G3 + G2/3 + max nE[|4-(t)Z,fe(t)Z,Kt)|]fit, 

Jo ^^j,k,Kp 
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where (4) follows from Lemma IA.61 (5) from definition of E, (6) from a 
trivial inequality, (7) from Lemma |A.5[ We have to bound the integral on 
the last line. Let io{t) = l/{Vt A \/l — t), and observe that 

/ max nE[\Zij{t)Zik{t)Zii{t)\]dt 
-I 

uj{t) max nE[\{Zij{t)/oj{t))Zik{t)Zu{t)\]dt 

^n['uj{t) max (E[|4-(t)/w(t)|3]E[|Zifc(t)|3]E[|ZiKt)|']) dt, 
where the last inequality is by Holder. The last term is further bounded as 
^(1) I / ^{^)At \ max E[(|xij| + \yij\f\ 

<(2) n-V2 ^ax [(E[|x.,f ])V3 + (E[|y,,f ])V3]3 
<(3) n-V2 max [(E[|x.,f ])V3 + (E[|y,,f ])V3]3 
<(4) n"^/^ max E[|xijf ], 

where (1) follows from the fact that: ^ (l^ijl + lyiiD/v^i 

|-^im(i)| ^ (|Sim| + |yim|)/\/n, and the product of terms E[(|xij| + 
E[(|xjfc| + \yik\f^^'^ andE[(|ii;| + is trivially bounded by maxi<;j^p 

+ l^ul)^]; (2) follows from ^l<jj{t)dt < 1, (3) from Lemma[A3(a), 
and (4) from the normality of with E[y|^] = E[xfj], so that E[|yjj|3] < 
(E[y2.])3/2 = (E[|a;fj.|])3/2 ^ E[|xij|3]. This completes the overah proof. ■ 

B.4. Proof of Corollary 12. 2L See Supplemental Appendix IF. 21 ■ 

B.5. Proof of Lemma 12.21 Since E[x?j] ^ ci by assumption, we have 

> ii(E[xf^])^/^} ^ > cy^u}. By Markov's inequality and the 

condition of the lemma, we have 

P (^\xij\ > 'u(E[x^j])^/^, for some (i, j)) ^ Yl^=i^ (^i<^< l^'*-'^ ^ '^i^^^^ 

^ ELiP [Kmax \xiA/D) > hic/^u/D)] ^ n/h{c/^u/D). 

This implies Ux{'y) ^ Ci ^^'^ Dh~^{n/"f). For Uy{'j), by yij ~ N{0, E[x?j]) with 
E[x2^] ^ we have E[exp(y2./(4S2))] < 1. Hence 

P (\y,j\ > n(E[4.])V2, for some ^ E^^.T^U^ilvnl > ^T^) 

^ Er=iE?=iP(|y..l/(2i3) > c^ulilB)) < npeM-cin'/{AB')). 
Therefore, Uy{-j) ^ C B y/\og{pn / "f) where C > depends only on ci. ■ 
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B. 6. Proof of Corollary 12. 3L Case (i) follows directly from Corollary l2.1l 

Hence we only consider cases (ii)-(v). 

Step 1. In this step, in each case of conditions (E.2)-(E.5), we shall 
compute the following bounds on moments M3 and M4 and parameters B 
and D in Lemma 12.21 with specific choice of h: 

(E.2) B V M| V M| ^ C, L> ^ Clogp, h{v) = - 1; 

(E.3) B = Bn,Ds^ CBn, Ml V M| ^ C5„, h{v) = - 1; 

(E.4) B V M| V M| < CBn, D ^ CBn logp, h{v) = e" - 1; 

(E.5) B\J D\J M^y Mj^ CBn, h{v) = v^. 

Here C > is a (sufficiently large) constant that depends only on ci and Ci . 
The bounds on B, M3 and M4 follow from elementary computations using 
Holder's inequality. The bounds on D follow from an elementary application 
of Lemma 2.2.2 in [Z^]. For brevity, we omit the detail. 

Step 2. In either case of (ii)-(v), there are sufficiently small constants 
C3 > and C4 > 0, and a sufficiently large constant C2 > 0, depending only 
on ci,C2,Ci such that, with in ■= log(pn-'^+'^^), 

n-^/^ll/^max{Bil/^,Dh-\n^+''-')} ^ CzR-'^*, 

Hence taking 7 = n~^^, we conclude from Corollary 12.21 and Lemma [2. 21 that 
p ^ (7,^-min{c3,c4} -^j^gj-g C > depends only on ci, C2, Ci. ■ 

Appendix C. Proofs for Section [3] 

C. l. Proof of Lemma 13.21 Recall that A = maxi<jj /;<jp |E„[xjjXjfc] — 
E[xjjXjfc]|. By Lemma 13.11 on the event {{xi)^^^ : A ^ ■&}, we have 
|P(Zo ^t) - Pe(Wo ^ t)\ ^ vr(??) for ah t G M, and so on this event 

PeiWo ^ cz, (« + nid))) ^ P(Zo < czo [a + 7r(i?))) - 7r(i?) ^ a + ^(,9) - 7r(,9) = a, 

implying the first claim. The second claim follows similarly. ■ 

C.2. Proof of Lemma 13.31 By equation (llSp . the probability of the event 
{{xi)f=i ■■ Pe{\W - Wo\ > Ci) ^ C2} is at least 1 - (2- On this event, 

Pe{W ^ cwoia + C2) + Ci) ^ Pe(Wo ^ cwoia + C2)) - C2 ^ a + C2 - C2 = «, 

implying that P(cvk(«) ^ cvko(" + C2) + Ci) ^ 1 — C2- The second claim of 
the lemma follows similarly. ■ 

C.3. Proof of Theorem D For ?? > 0, let 7r(T9) := C2??^/3(lVlog(p/??))2/3 
as defined in Lemma |3.2[ Then 

Fin^cwoia)) ^(1) P(To^czo(a + vr(i9)))+P(A>i9) 
^(2) a + 7r(T9) + P(A > ??) + p, 

where (1) follows from Lemma 13.21 and (2) follows from definition of p and 
the fact that Zq has no point masses. The upper bound is proven. The 
lower bound follows similarly. ■ 
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C.4. Proof of Theorem [331 For > 0, let7r(??) := C2t?^/^(lVlog(p/t?))2/3 
with C2 > as in Lemma 13.21 Then 

P(r < cw{a)) ^(1) P(To ^ cw{a) + Ci) + C2 
^(2) P(ro ^ CH'o(a + C2) + 2Ci) + 2C2 
^(3) P(ro czM + C2 + 7r(i?)) + 2Ci) + 2C2 + P(A > d) 

sc:(4) P(Zo s; czo{a + C2 + A-d)) + 2Ci) + p + 2C2 + P(A > 1?) 



^(5) P(Zo s; C2,(a + C2 + 7r(79))) + C3C1 ^1 V log(p/Ci) + P + 2C2 + P(A > 
^(6) a + C2 + 7r(t9) + C3CiVlVlog(p/Ci) + 2C2 + P(A > + p 

where C3 > depends on ci and Ci only and where (1) fohows from equation 
()14p . (2) from Lemma [3.3| (3) from Lemma [3. 2 ( (4) from the definition of p, 
and (5) follows from Lemma 12.11 on anti-concentration, and (6) by the fact 
that Zq has no point masses. This gives the upper bound. The lower bound 
follows similarly. ■ 

C.5. Proof of Corollary 13.11 The proof of this corollary relies on: 
Lemma C.l. Recall conditions (E.2)-(E.5) in Section \2. 21 Then 



E[A] ^Cx < 



lop y (log(pn)) (logp) ^ ^^^g^ 

Bl^y Bliiogipn))Hiosp) ^ lender ^i?.^;, 
Mip y under (E.5), 

where C > depends only on ci and Ci that appear in (E.2)-(E.5). 
Proof. By Lemma lA.ll and Holder's inequality, we have 

E[A] < Ml^{\ogp)/n + (E[max \xij\^])^l\\ogp) /n. 

The conclusion of the lemma follows from elementary calculations with help 



of Lemma 2.2.2 in j44i |. 



Proof of Corollary \3.1l We make use of Theorem [3T2l Let c > and C > 
denote generic constants depending only on ci, C2, Ci, and their values may 
change from place to place. By Corollary 12.31 in either case of (i)-(iv), 
p ^ Cn^^. Moreover, C,l^/\ogp ^ Cin~^'^ implies that C,i ^ Cin^^^ (recall 
p ^ 3), and hence Ci \/log(p/Ci) ^ Cn~'^. Also, C2 ^ Cn~^ by assumption. 

Let i9 = i?„ := (E[A])V2/ logp. By Lemma [C31 E[A](logp)2 ^ Cn-". 
Therefore, 7r('!9) ^ Cn~'^ (with possibly different c, C > 0). In addition, by 
Markov's inequality, P(A > -d) ^ E[A]/'!9 ^ Cn~'^. Hence, by Theorem | 
we have sup^g(o,i) ^ cw[ci)) — a| ^ Cn~'^. 
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Appendix D. Proofs for Section [H 

D.l. Proof of Theorem 14.11 The proof proceeds in three steps. In the 
proof {(3,X) denotes X^''^) with k either or 1. 

Step 1. Here we show that there exist some constants c > and C > 
(depending only ci, Ci and o"^) such that for either k £ {0, 1}, 

(21) P(ro ^ A^'^)) ^l-a-un, 

with Un = Cn~'^. We first note that Tq = y/nniaxi<c^i^^2p^n[zik£i], where 
Zi = (Zj', — Zj')'. Apphcation of Coronarv l2.3l -(v) gives 

|P(ro^A)-P(Zo^ A)KC7n-^ 

where c > and C > are constants depending only on ci,Ci and o"^. 
Since A ^ czo(l — ct), the claim follows. Indeed, A*^^^ = czo(l — ct), and 
A^^^ ^ A^") = co(l — a) := (T<I>~^(1 — a/{2p)), since by the union bound 
F{Zo^co{l-a))^2pF{aN{0,l)^co{l-a)) = a. 

Step 2. We claim that with probability ^ 1 — a — 6 = (3 — ji obeys: 

max |E„[2:jj(z^(5)]| < 2A. 

Indeed, by definition of /3, ^/nmaxi^j^p |E„[zij(?/i — z-/3)]| ^ A, which by the 
triangle inequality implies -y/nmaxisgj^p |E„[zij(z-(5)]| ^ Tq + A. The claim 
follows from Step 1. 

Step 3. By Step 1, with probability ^ 1 — a — Vn-, the true value /3 obeys 
the constraint in optimization problem (jl6|) . in which case by definition of 
j3, Wl^Wt^ ^ Therefore, with the same probability, 5 G 'R-{I3) = {(5 € 

: 11/3 + 6\\e^ ^ By definition of ki{(3) we have that 

Kii/Smii ^ max \En[zi,{z'M- 

Combining this inequality with Step 2 gives the claim of the theorem. ■ 

D.2. Proof of Theorem 14.21 The proof has four steps. In the proof, we 
let Qn = Cn~'^ for sufficiently small c > and sufficiently large C > 
depending only on ci, Ci, a^, cr^, where c and C (and hence Qn) may change 
from place to place. 

Step 0. The same argument as in the previous proof applies to (3^^^ with 
A = A(°) := co(l - 1/n), where now is the upper bound on E[e?]. Thus, 
we conclude that with probability at least 1 — Qn, 

iig'°)-giu< ^°'<'-;^'" . 

Step 1. We claim that with probability at least 1 — Qn, 



..... {Enm-e.f]f'^B j''f-y;;^ 



max 
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Application of Holder's inequality and identity Si — Ei = zl{l3^'^^ — /?) gives 

max {En[zfM - Sif])'^^ ^ S^E^z^^^") - ^ i?„||^(°) - /3||pr. 

The claim follows from Step 0. 

Step 2. In this step, we apply Corollary l3.H -(iv) to 

T = Tn = \/n max E„[zjiejl, W = Jn max E„[zj,-ejeil, and 

Wo = yfn max E„[ijjejei], 

where Zi = {z[, —z[)', to conclude that uniformly in a G (0, 1) 
(22) P(ro ^ CH/(1- a)) ^ 

To show applicability of Corollary 13. 11 - (iv). we note that for any Qi > 0, 



V,{\W ~ Wo\ > Cl) < ^e[\W - Wo\]/Cl ^ V^Ee 



max |E„[zy (ei - ei)ei]| 



/Cl 



< v/i^ max (E„[4(£, - £02])i/2/^^^ 



where the third inequality is due to Pisier's inequality. The last quantity is 
bounded by (t^ logp)^/^/Ci with probability ^ 1 — by Step 1. 

Since Lnlogp ^ Cin^^^ by assumption (yi) of the theorem, we can take 
Cl in such a way that Ci (log p) ""^^^ ^ Qn and (/-^ logp)^/^/Ci ^ 6n- Then 
all the conditions of Corollary l3.H -(iv) with so defined Ci and C2 = V 
((t^ logp)^/^/Ci) are satisfied, and hence application of the corollary gives 
that uniformly in a G (0, 1), 

(23) \F{To^cw(.l-a))-l-a\^gn, 

which implies the claim of this step. 

Step 3. In this step we claim that with probability at least 1 — Qn, 

cw{l - a) ^ czo{l - a + 2Qn). 

Combining Step 2 and Lemma [3.31 gives that with probability at least 1 — C2j 
ciy (1 — a) ^ cvKo (1 — o + C2) + Cl; where Ci and C2 are chosen as in Step 2. In 
addition, Lemma [312] shows that cwoi^ — ot + C,2) ^ czq{1 — a + Qn)- Finally, 
Lemma [2?T] yields czq{1 — a + Qn) + Ci ^ c^o(l — a + '^Qn)- Combining these 
bounds gives the claim of this step. 

Step 4. Given (j22p . the rest of the proof is identical to Steps 2-3 in the 
proof of Theorem 14. II with A = cm/(1 — q). The result follows for Un = "^Qn-* 
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Supplemental Material I for "Central limit theorem and 
multiplier bootstrap when p is much larger than n" 

Additional Theoretical Results and Omitted Proofs 

V. Chernozhukov, D. Chetverikov, and K. Kato 

Appendix E. A note on relation between Slepian and Stein 

TYPE METHODS FOR NORMAL APPROXIMATIONS 

To keep the notation simple, consider a random vector X in and a 
standard normal vector Z in MP. We are interested in bounding 

E[g{X)]-E[giZ)i 

over some collection of test functions g £ G- Without loss of generality, 
suppose that Z and X are independent. 

Consider Stein's partial differential equation: 

g{x) - E[g{Z)] = Ah{x) - x'Vh{x). 

It is well known, e.g. [l^ and [l^, that an explicit solution for h in this 
equation is given by 



hix) :-- 



2t 



E[g{Vix + Vl^tZ)] - E[g{Z)] 



dt, 



so that 

E[giX)] - E[g{Z)] = E[Ah{X) - XVhiX)]. 
The Stein type method for normal approximation bounds the right side for 

Next, let us consider the Slepian smart path interpolation: 

Z{t) = Vix + Vi - tz. 

Then we have 



E[giX)] - E[giZ)] = E 



X 



z 



dt. 



The Slepian type method, as used in our paper, bounds the right side for 

see. 

Elementary calculations and integration by parts yield the following ob- 
servation. 



Lemma E.l. Suppose that g : —, 
bounded derivatives up to order two. Then 



is a C'^ -function with uniformly 



and 



E 



II := E 



X 

Vt 



;^v.(z(.)r(-^ 



-E[X'Vh{X)] 



-E[Ah{X)]. 
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Hence the Slepian and Stein methods both show that difference between I 
and II is small or approaches zero under suitable conditions on X] therefore, 
they are very similar in spirit, if not identical. The details of treating terms 
may be different from application to application. 



Proof of Lemma \E.li By definition of h, we have 



-E[X'Vh{X)] = E 



X' ^Vg{Z{t))Vidt 



= E 









On the other hand, by definition of h and Stein's identity (Lemma IE.2p . 



-E[Ah{X)] = E 



Ag{Z{t))dt 



E 



VgiZit))' 



Z 



dt 



This completes the proof. 



Lemma E.2 (Stein's identity). Let W = {Wi, . . . ,Wp)'^ be a centered 
Gaussian random vector in W. Let f : W ^ be a -function such 
that E[\dj f {W)\] < oo for all I ^ j ^ p. Then for every 1 ^ j ^ p, 

p 

E[W,f{W)] = Y,nWjWk]E[dkf{W)]. 

k=l 



Proof of Lemma \E.S[ See Section A. 6 of [43|, and also [42] • 



Appendix F. Omitted proofs 

F.l. Proof of Lemma[A3 Claim (a). Define = l{\xij\ ^ n(E[x2^.]) ^2}^ 
and observe that 

(E[|x,,f])V5 ^ (E[|x,,-/,,f])V'' + (E„[|E[x,,I,,-]r])^/'? 

^ (E[|a;i,-/,,f l)^/" + {Ellxijl,,]"])'/" ^ 2(E[|x,,f ])^/''. 

Claim (b). Observe that 

^JE[{iij - x,,)2]^E[ifJ + ^E[(5,fc - x,fc)2]yi[4] 

^ 2c^(n)yi[4]yi[^+^(n)yi[^yi[4] 

^i3/2Mumxl]+E[xl]), 

where the first inequality follows from the triangle inequality, the second 
from the Cauchy-Schwarz inequality, the third from the definition of (p{u) 
together with claim (a), and the last from inequality \ab\ ^ (a^ + 6^)/2. 

Claim (c). This follows from the Cauchy-Schwarz inequality. 

Claim (d). We shall use the following lemma. 
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Lemma F.l (Tail Bounds for Self-Normalized Sums). Let ■ ■ ■ ,£,n be in- 
dependent real-valued random variables such that ^[^i] = and E[^?] < oo 
for all 1 ^ i ^ n. Let Sn = Y17=i ^^6*^ f^f every x > 0, 

Pi\Sn\ > x(4B„ + Vn)) ^ 4exp(-xV2), 

where Bl = E^i E[ef] and = EHi if- 

Proof of Lemma \F.li See [l9i] . Theorem 2.16. ■ 
Define 

Aj := i^JE[{Xij -Xij)^] + yjEn[{Xij -Xij^]. 

Then by Lemma IF. II and the union bound, with probability at least 1 — 47, 

\Xj -Xj\^ AjV21og(p/7), for all 1 ^ j ^ p. 
By claim (c), for u ^ u{j), with probability at least 1 — 7, for all 1 ^ j ^ p, 

Aj = A^E[{xij - Xij)^] + ^E^[{E[xij -Xij]y] ^ 5^E[xlMu). 

The last two assertions imply claim (d). ■ 

F.2. Proof of Corollary 12.21 Since M2 is bounded from below and above 
by positive constants, we may normalize M2 = 1, without loss of generality. 
In this proof, let C > denote a generic constant depending only on ci and 
Ci, and its value may change from place to place. 

For given 7 G (0, 1), denote in ■= log{pn/j) ^ 1 and let 

u, := n3/845/8^3/4 ,^ n^/H-^/^Ml^\ 

Define u := u{'^) V ui V M2 and /3 := ^Jnji^yplu). Then u ^ u(7) and the 
choice of /3 trivially obeys 2y/2uf3 ^ y/n. So, by Theorem 12.21 and using the 
argument as that in the proof of Corollarv l2.1| for every ■0 > 0, we have for 

any (^(u) ^ ip{u), 

(24) +i;^{u)y/log{p/-f) + (/3-1 logp + V^^) V log(p^)J . 

Step 1. We claim that we can take (p{u) := CM^/u for all u> 0. Since 

E{xfj] ^ ci, we have l{\xij\ > u(E[xfj]y/^} ^ l{\xij\ > c}^^m}. Hence 

E[xll{\x,,\ > n(E[x2.])i/2}] ^ E[xll{\x,,\ > c^/^}] 

^ E[xfjl{\xij\ > cy\}]/{ciu^) ^ E[4.]/(cin2) ^ M^/iciu^). 
This implies fxiu) ^ CM^/u. For ipy{u), note that 

E[4] = E„[E[4]] = 3E„[(E[y2.])2] = 3E„[(E[x2.])2] ^ 3E„[E[xf,.]] = E^-], 
and hence ^y{u) ^ CMl/u as well. This implies the claim of this step. 



36 CHERNOZHUKOV, CHETVERIKOV, AND KATO 



Step 2. We shall bound the right side of (p^ by suitably choosing ip 
depending on the range of u. In order to set up this choice we define u* by 
the following equation: 

^(u^)n3/V(M|^^/6)3/4 ^ ^ 

We then take 

[ in i'^iu)) if n < u*. 

We note that for u < u*, 

That is, when u < u* the smoothing parameter ip is smaller than when 
u ^ u*. 

Using these choices of parameters (3 and ■0 and elementary calculations 
(which will be done in Step 3 below), we conclude from (j24p that whether 
u < u* or u*, 

C7(n-i/V3/2 + 7). 

The bound in the corollary follows from this inequality. 

Step 3. (Computation of the bound on p). Note that since p ^ 1, we 

only had to consider the case where n~^/'^u^'^ ^ 1 since otherwise the 
inequality is trivial by taking, say, C = 1. Since ui = u^/^m'^^'^ /(^^ and 
U2 = u^/^mI^"^ /l^^ , we have 

(^(ui) ^ Cn-^l^tll''MllMll\ 
Also note that ip ^ n^^^, and so 1 V log(p'i/') ^ log(pn) ^ in- Therefore, 

logpViviog(pv) < r'C^^ < n-'/vj\ 

In addition, note that f3 < ^Jnju ^ ^Jnju\ = n^/^l^^ ^''^ =: ^ and 
ijj ^ (3 under either case. This implies that (-0^ + "0^/^ + "0/^^) ^ "0/3^ and 

Using these inequalities, we can compute the bounds claimed above, 
(a). Bounding p when u ^ u*. Then 

(V' + i^miu) < ^M-^) ^ ^Mu*) ^ n-^l^e^Ml!^ ^ n-'I'^ui^^- 
V^^(n)v/log(p/7) ^ ^mu)^ffnlP ^ ^Mu") ^ n-^'^ul'^J^; and 

r^VTn ^ ^ ^-1/2^^3/2. 
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where we have used Step 1 and the fact that 

The claimed bound on p now foUows. 

(b). Bounding p when u <u*. Since if) is smaller than in case (a), by the 
calculations in Step (a) 

Moreover, using definition of ^, u > U2, definition of U2, we have 
Analogously and using n ^ 1, we have 

This completes the proof. ■ 
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Supplemental Material II for "Central limit theorem and 
multiplier bootstrap when p is much larger than n" 

Additional Applications 

V. Chernozhukov, D. Chetverikov, and K. Kato 

Appendix G. Application: Multiple Hypothesis Testing via the 

Stepdown Method 

In this section, we study the problem of multiple hypothesis testing in 
the framework of multiple linear regressions. (Note that the problem of 
testing multiple means is a special case of testing m ultip le regressions.) We 
combine a general stepdown procedure described in [40| with the multiplier 
bootstrap developed in this paper. In contrast with [40|], our results do not 
require weak convergence arguments, and, thus, can be applied to models 
with increasing numbers of both parameters and regressions. Notably, the 
number of regressions can be large in comparison with the sample size. 

Let {zi,yi)f^i be a sample of independent observations where Zi G MP is 
a vector of non-stochastic covariates and yi G is a vector of dependent 
random variables. For each k = 1, . . . ,K, let Ik C {1, . . . ,p} be a subset 
of covariates used in the k-th regression. Denote by \Ik\ = Pk the number 
of covariates in the A:-th regression, and let p = maxi-g^^^-pfc. Let Vik be 
a subvector of Zi consisting of those elements of Zi whose indices appear in 
h- Vik = We denote components of Vik by Vikj, j = l,...,pk- 

Without loss of generality, we assume that Ik D Ik' = for all k ^ k' and 

El^k^KPk=P- 

For each k = 1, . . . , K , consider the linear regression model 

Vik = v'ik/^k + £ik, i = l,...,n, 

where /3fc G M^*^ is an unknown parameter of interest, and {eik)i'^i is a 
sequence of independent zero-mean unobservable scalar random variables. 
We allow for triangular array asymptotics so that everything in the model, 
and, in particular, the number of regressions K and the dimensions of the 
parameters and pk, may depend on n. For brevity, however, we omit index 
n. We are interested in simultaneously testing the set of null hypotheses 
Hkj ■ Pkj = against the alternatives H'j^j : f3kj / 0, {k,j) G Wo for some 
set of pairs Wo where f3kj denotes the j th component of /3k , with the strong 
control of the family-wise error rate. In other words, we seek a procedure 
that would reject at least one true null hypothesis with probability not 
greater than a + o(l) uniformly over the set of true null hypotheses. More 
formally, let Q he a set of all data generating processes, and uj be the true 
process. Each null hypothesis Hkj is equivalent to w G Qkj for some subset 
Qkj of ri. Let W denote the set of all pairs {k,j) with k = 1,. . . ,K and 
3 = 1,... ,Pk. 

W = {{k,j):k = l,...,K-j = l,...,pk}. 



CLT AND MULTIPLIER BOOTSTRAP WHEN p IS MUCH LARGER THAN n 39 



For a subset w C W let = {ri(k,j)(^w^kj) n {(^{k,j)(^w^'kj) where = 
The strong control of the family-wise error rate means 

(26) 

sup sup Pjreject at least one hypothesis among Hi^j, {k,j) £ w} ^ a+o(l) 

This setting is clearly of interest in many empirical studies. 

Our approach is based on the simultaneous analysis of i-statistics for each 
component Pkj- Let Xik = i^n[vikv'^i^])~^Vik- Then the OLS estimator /3fc of 
/3k is given by f^k = ^n[xikyik]- The corresponding residuals are Eik = Vik - 
v'^j^(3, i = l,...,n. Since (xjfc)"=i is non-stochastic, the covariance matrix 
of Pk is given by V{l3k) = En[xikx'-^.afi^]/n where crf^ = E[£^ J, i = 1, . . . ,n. 

The i-statistic for testing Hkj against H'f^j is tkj := \l V{(^k)jj where 

Vi^k) = "^rAxikx'ik^^ifJl/n. Also define 

^0 ._ ISi=l Xjkj^ik/ 

\l^n[xi^nk\ 

Note that tkj = t^j under the hypothesis Hkj. 

The stepdown procedure of [iOl is described as follows. For a subset 
w CW, let 

ci—a,w be some estimator of the (1 ct)-quantile of max^^^ j'jg^ t^^. 
On the first step, let w{l) = Wq. Reject all hypotheses Hkj satisfying 
tkj > Ci_a^w{i)- If iiull hypothesis is rejected, then stop. If some Hkj 
are rejected, then let w{2) be the set of all null hypotheses that were not 
rejected on the first step. On step / ^ 2, let w{l) C W be the subset of null 
hypotheses that were not rejected up to step I. Reject all hypotheses Hkj, 
{k,j) £ w{l), satisfying tkj > Ci_a^w{i)- If no null hypothesis is rejected, 
then stop. If some Hkj are rejected, then let w{l + 1) be the subset of all 
null hypotheses among (fc,j) G w{l) that were not rejected. Proceed in this 
way until the algorithm stops. 

[40[] proved the following result. Suppose that ci-a,w satisfies 

(27) ci-a,w' ^ ci-a,w" whenever w' C w" , 

(28) sup sup P I max • > ci-a,w I ^ a + o(l), 

then inequality (p6|) holds. Indeed, let w be the set of true null hypotheses. 
Suppose that the procedure rejects at least one of these hypotheses. Let I 
be the step when the procedure rejected a true null hypothesis for the first 
time, and let -fffcoio this hypothesis. Clearly, we have w{l) D w. So, 

max tj^j ^ ^fenJO ~ ^fcoio ^ '^l—a,w(l) ^ C.i—ct,w 

(k,j)ew 

Combining this chain of inequalities with (1281) yields (I26p . 

To obtain suitable ci-a,w that satisfies inequalities ([27|) and ([28]) above, 
we can use the multiplier bootstrap method. Let {ei)f^i be an i.i.d. sequence 
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of A^(0, 1) random variables that are independent of the data. Let ci-a,w be 
the conditional (1 — a)-quantile of 

(29) max lEL i 

given {zi,yi)'^^^. To prove that so defined critical values ci—ct,w satisfy in- 
equalities (j27p and (j28p . we will assume the following regularity condition, 

(M) There are some constants ci > 0, fi^ > 0,a;^ > and a sequence 
Bn 5^ 1 of constants such that for l^i^n, l^j^p, l^k^K, 
li^lf^Pk-. (i) \zij\ ^ Bn, (ii) En[zfj] = 1; (ifi) ^ E[4] < a^; (iv) 
the minimum eigenvalue of Kn[vikV^j^] is bounded from below by ci; 
and (v) En[xff.i] ^ d. 

Theorem G.l (Strong Control of Family- Wise Error Rate). Let Ci > 

be some constant and suppose that assumption M is satisfied. Moreover, 
suppose either 

(a) E[ma.'x.i^k^x ^ikl ^ C*! for all 1 i ^ n, p^B^{logp)^ /n = o(l) and 

B^{log{pn)y /n = o(l); or 

(b) E[exp(|eifc|/Ci)] ^ 2 for all 1 i n, 1 ^ k ^ K , p^Bl{\ogpf/n = 
o(l) and pB'^{log{pn)y / n = o(l). 

Then the stepdown procedure with the multiplier bootstrap critical values 
c-x-aw given above satisfies [2^) . 



Comment G.l (Relation to prior results). There is a vast literature on 
multiple hypothesis testing. Let us consider the simple case where K = 
p,Pk = 1 for all k = 1,...,K and Vi^ = 1, so that the k-th regression 
reduces to yik = /3fc + Sik (here /3k is scalar). The problem then reduces 
to testing multiple means (without stepdown). It is instructive to see the 
implication of Theorem IG.ll in this simple setting. Denote by tk the t- 
statistic for testing : f3k = against H'j^ : /3fc 7^ 0, and let ci_q be the 
conditional (1 — a)-quantile of 

\Y17=i^ikei/V^\ 
max , 

where Eik = yik - Vk, Vk = EnlVik], and {ciYl^i is a s equen ce of i.i.d. A^(0, 1) 
random variables independent of the data. Theorem IG.ll implies that, when 
Hk are true for all fc, P(maxi^fc^ptfc > ci_a) ^ a + o(l) (indeed, the inequal- 
ity can be replaced by the equality "=") uniformly in the underlying 
distribution provided that ^ "^[^Ik^ ^ logP = o(n^/'') and either (a) 
E[maxi^fc^pe^^] ^ Ci or (b) E[exp(|ejfc|/Ci)] ^ 2. Hence the multiplier 
bootstrap as described above leads to an asymptotically exact testing proce- 
dure for the multiple hypothesis testing problem of which the logarithm of 
the number of hypotheses is nearly of order n^/'' (subject to the prescribed 
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assumptions). Note here that no assumption on the dependency structure 
between yn, . . . , yip is made. 

The question on how large p can be was studied in [2l| but from a conser- 
vative perspective. The motivation there is to know how fast p can grow to 
maintain the size of the simultaneous test when we calculate critical values 
(conservatively) ignoring the dependency among tk and assuming that tk 
were distributed as, say, N(0, 1). This framework is conservative in that 
correlation amongst statistics is dealt away with union bounds, namely by 
Bonferroni-Holm procedures. In contrast, our approach takes into account 
the correlation amongst statistics and hence is asymptotically exact, that is, 
asymptotically non-conservative. ■ 



Appendix H. Application: Adaptive Specification Testing 

In this section, we study the problem of adaptive specification testing. 
Let {vi,yi)f^i be a sample of independent random pairs where yi is a scalar 
dependent random variable, and Vi S M'^ is a vector of non-stochastic co- 
variates. The null hypothesis, Hq, is that there exists /3 S such that 

(30) E[yi]=v'if3;i = l,...,n. 



The alternative hypothesis. Ha, is that there is no /3 satisfying (j30p . We 
allow for triangular array asymptotics so that everything in the model may 
depend on n. For brevity, however, we omit index n. 

Let £i = yi — E[yj], i = 1, . . . ,n. Then E[ej] = 0, and under Hq, yi = 
v[l3 + £i. To test Hq, consider a set of test functions Pj{vi), j = 1, . . . ,p. Let 
Zij = Pj{vi). We choose test functions so that E„[2:jjfj] = and IEn[-z^^,■] = 1 
for all j = 1, . . . ,p. In our analysis, p may be higher or even much higher 
than n. Let /3 = (E„[t>jt)^])~^(E„[t'j?/j]) be an OLS estimator of f3, and let 
£i = Vi — -2^/3; i = 1, . . . , n be corresponding residuals. Our test statistic is 

T := max 

The test rejects Hq if T is significantly large. 
Note that since E„[2;jjUj] = 0, we have 

n n n 

'^ZijEi/./n = '^Zij{ei + v[{l3 - li))/\/n = ^^ZijEi/^/n. 

i=l i=l i=l 

Therefore, under Hq, 

T = max ^=^'^^'^1^ 

This suggests that we can use the multiplier bootstrap to obtain a critical 
value for the test. More precisely, let (e^)"^]^ be a sequence of independent 
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iV(0, 1) random variables that are independent of the data, and let 
W := max 1^^=! _ 

The multiplier bootstrap critical value ciy(l — a) is the conditional (1 — a)- 
quantile of W given the data. To prove the validity of multiplier bootstrap, 
we will impose the following condition: 

(S) There are some constants ci > 0,Ci > 0,a^ > 0,ct^ > 0, and a 
sequence Bn ^ 1 of constants such that for all 1 ^ i ^ n, 1 ^ j ^ p, 
l^ki^d: (i) \zij\ ^ Bn, (ii) En[zf.] = 1; (iii) ^ E[e^] ^ a^; (iv) 
\vik\ ^ Ci; (v) d ^ Ci; and (vi) the minimum eigenvalue oiKn[viv'^ 
is bounded from below by ci. 

Theorem H.l (Size Control of Adaptive Specification Test). Let C2 > be 

some constant. Suppose that assumption S is satisfied. Moreover, suppose 
that either 

(a) E[e|] ^ Ci for all 1 ^ i ^ n and B^{log{pn)y / n ^ Cin~~^^; or 

(b) E[exp(|ei|/Ci)] 2 for all I ^ i ^ n and Bl{\og{pn)y /n ^ Cin-"^ . 

Then there exist constants c > and C > 0, depending only on ci,C2,Ci,a^ 
and a'^ , such that under Hq, |P(T ^ cvk(1 — Oi)) ~ (1 ~ ^ Cn"^. 

Comment H.l. The literature on specification testing is large. In partic- 
ular, [28] and [26] developed adaptive tests that are suitable for inference in 
L2-norm. In contrast, our test is most suitable for inference in sup-norm. An 
advantage of our procedure is that selecting a wide class of test functions 
leads to a test that can effectively adapt to a wide range of alternatives, 
including those that can not be well-approximated by Holder-continuous 
functions. ■ 



Appendix I. Proofs for Section [Gl 

I.l. Proof of Theorem lG.li The multiplier bootstrap critical value ci-a^w 
clearly satisfies ci-a,w ^ ci~a,w' whenever w C w' , so inequality ([27]) is 
satisfied. Therefore, it suffices to prove (j28p . For the notational convenience, 
we will only consider the w = W case and suppress the uniformity in the 
underlying distribution. The general case follows from inspection of the 
proof. 

Let us define 

T..= r.J^^^^^MMM, P^:=max^^4^SlI^. 



We shall prove that P(T > c^y(l — a)) = a+o(l), where recall that cwi^ — a) 
is the conditional (1— a)-quantile of W given {sik). Here we will only consider 
case (a) of the theorem. The proof for case (b) is similar and hence omitted. 
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We make use of Corollary 13. H -(iv) to prove the desired claim. Define 

We first verify conditions (jl4p and (jlSp in Section [3l We will use the following 
facts directly deduced from assumption M: 

maxixjfcjl ^ maxllxj/cll c^^^ max ||f 

i,k,j i,k i,k 

(31) ^ q ^Vpmaxlvjfcjl s^m c{^\/pBn, 
maxE„[xffcj] ^ maxE„[||a;ifc||^] 

k,j k 

(32) ^(3) q 2maxE„[||'t;ifc|p] ^(4) ^^P, 

where (1) and (3) follow from assumption M-(iv) and definition of Xj^, (2) 
is from M-(i) since Vik is a subvector of Zi, and (4) is due to M-(ii). We shall 
first prove some lemmas. In these lemmas, we will assume all the conditions 
in Theorem IG.ll case (a) without mentioning so. 

Lemma I.l. Y17=i ^ikj^ik/V^ = Op(r„i) uniformly over k = 1, . . . , K and 
j = 1,... ,pk where r„i = Vplogp. 

Proof. By Lemma I A . 1 1 combined with inequalities ()3ip and (I32p . we have 
^[m&y.\YJl=iXikjeik/ \/n\] = 0(VP-B„(logp)/n^/'^+ ^/plogp) = 0(\/plogp), 

where the second step follows because Bn^/\ogp/n^/^ = o(l). The claim 
follows from Markov's inequality. ■ 

Lemma 1.2. ^n\x'ikj{^k~ ^'ik)\ ~ Op{rn2) uniformly over k = 1, . . . , K and 
j = 1,... ,pk where rn2 = pBl{\ogp)/^. 

Proof. We have 

lEn[x-fcj(effc - (jfk)] = "^nlxjkji^k - 0-ifc)] + "^nlxikjiv'ikih - Pk)f] 

- '^^nlx'ikj^ikv'ikih - h)] 
= '■ Ijk + Iljk + Illjk- 

We will show in steps 1-3 below that Ijk = 0-p{pB'^{\ogp) / y/n),IIjk = 
Op{p'^B^{logp)/n), and Illjk = Op{p'^ B'^{logp)/n) uniformly over k = 
1, . . . ,K and j = 1, . . . ,pk. The claim of the lemma follows since p/\/n — )■ 0. 

Step 1. We prove that Ijk = ^n[xlj{el - a^)] = Op{pBl{\ogp) / ^) 
uniformly over k = 1, . . . , and j = 1, . . . ,pk. 

By Lemma lA.ll combined with inequalities (I3ip and ()32p , we have 

E[max|E„[x^fcj.(e^^ - afk)]\] = O {pBl (log p) / + pBn V (log p) /n) 

k,3 

= 0{pBl{logp)/V^), 
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where the second step fohows because Bn ^ 1. The claim of this step fohows 
from Markov's inequahty. 
Step 2. We prove that 

II,k = En[xlj{v',,0k - I3k)f] = Ov{p''Bl{\ogp)/n) 
uniformly over k = 1, . . . , and j = 1, . . . ,pk- We have 
ui&-K¥.n[xikj{v'ik(Pk - Pk)f\ ^(1) q^pB^maxE„[(t>-fc(/3fc - ^k)f] 

= Ci^pBl maxEn[eikVik]En[vikVikr'^En[vikSik] 

k 



<(2) c^^pBlmax\\En[vikeik]\\'^ 

k 

k,j 

2 d2, 



=(3) Op{p'B'^{logp)/n), 

where (1) follows from inequality (j31|) . (2) from assumption M-(iv), and (3) 
from application of Lemma lA.ll The claim of this step follows. 
Step 3. We prove that 

ni,k = En[xl^eik{v'ik{h - Pk))] = Op{p^Bl{\ogp)/n) 
uniformly over k = 1^ . . . ,K and j = 1, . . . We have 

max \En[x'ikjeik{v[k{h - Pk))]\ ^ max \\En[x%jeikv'ik]\\Wk - M 

< v[^i^l^^/p\E„[x1f,,eikVikl\\\\^k - All- 

k,j,l 

Then 

max||3^fc -/3fc|| = uiasL\\En[vikv'ikf^En[vikeik\\\ ^(i) q ^ max ||E„[tiiA;eifc] || 

k k k 

^ q ^Vpmax|E„[t;ifcjeifc]| =(2) Op{\/p{\ogp)/n) 

k,j 

where (1) follows from assumption M-(iv) and (2) is as in step 2. In addition, 
by Lemma lA. 1 1 combined with inequalities (jSip and (j32p . we have 



E[max \En[xjkj^ikViki]\] = 0{pBl{\ogp) /n^'^ + pBl^ {log p)/ 

= 0{pBl^/^^^ 



in] 



where the last step is because Bn^/logp/'n}/'^ = o(l). Combining these 
bounds yields the claim of this step. ■ 

In Lemmas 11.31 and 11.41 Ee[-] denotes the expectation with respect to 
(6i)F=i conditional on (ejfc). 

Lemma 1.3. Yll=i^ikj^ik^i/ = Op(^'ni) uniformly over k = 1,...,K 
and j = 1, . . . ,pk- Recall that r„i = \/plogp. 
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Proof. We have 



Ee[max|X;r=ia^ifcjeifcei/Vn|] <(i) ^/log p max(E„ [xikjef,,] ) 
={2) \/logpmax(E„[a;2^j.CTf^] + Op(r„2))^/^ 
^(3) Vlog p max(E„ [xi^jaf,,] ) + Op(r„2\/logp) 
^(4) cr\/logpmax(E„[a;ffc^-])^/2 + Op(r„2\/logp) 



= {5) Op(Vplogp), 

where (1) follows from Pisier's inequality, (2) from lemma lL2t (3) follows 
from application of Taylor's theorem together with the fact that rn2 = o(l) 
and 'En[x'jj^j(7f^ is bounded away from zero (which is guaranteed by assump- 
tions M-(iii) and M-(v)) (4) follows from assumption M-(iii), and (5) is due 
to equation (j32p and r„2 = o(l). The claim of the lemma follows. ■ 

Lemma 1.4. Yji^i^ikji^ik - eik)ei/y/n = Op(r„3) uniformly over k = 
1,... ,K and j = 1, . . . ,pk where rns = pBn{\ogp) / y/n. 



Proof. We have 

^e[\Yd=iXikj{^ik - ^ik)ei/Vn\] <(i) \/iogpmax(E„[a;^fc^-(ejfc - £ikf]f^'^ 

k,j 



=(2) \/logpmax(E„[x-fcj-(w-fc(/3fc 

k,] 



= (3) 



Op(pS„(logp)/V^) 



where (1) follows from Pisier's inequality, (2) is by definition of ^ik, and (3) 
is by step 2 in the proof of lemma lL2l The claim follows. ■ 

Going back to the proof of Theorem IG.H by Lemmas ILII and IL2I and the 
fact that ^n[xfi^jO'fi^] is bounded away from zero, we have 

Y2i=l Xikj^ik/ V^l 



T = max 



= To + Op(r„ir„2) = Tq + op{l/ ^/logp), 

where the last step uses the fact that p'^B^{logp)^ /n = o(l). Similarly, by 
Lemmas IL2IIL41 we have 

W = max l^-;""^""^-/^' + Op(r„,..2) 

= Wo + Op(r„ir„2 + Tns) = Wq + op(l/yiogp), 

where the last step uses the fact that pBn{logp)^^'^ / y/n = o(l). Hence it is 
verified that conditions flll) and (fTSll in Section [3] are satisfied with some 
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sequences Ci = Cin — ^ and (2 = C2n such that CiVlogp + C2 = o(l). 
Therefore, the desired claim follows from Corollary 13. H -(iv) • ■ 

Appendix J. Proofs for Section [H] 

J.l. Proof of Theorem lH.li We only consider case (a). The proof for case 
(b) is similar and hence omitted. In this proof, let c, c', C, C denote generic 
positive constants depending only on ci, C2, Ci, a^, and their values may 
change from place to place. Let 

To := max ' '^^'^1 and W, := max ■ ^=1 ^v^^^^/V^\ 

We make use of Corollary I3.1l -(iv). To this end, we shall verify conditions 
([HD and in Section [31 which will be separately done in Steps 1 and 2, 
respectively. 

Step 1. We show that P(|T — Toj > Ci) < C2 for some Ci and C2 satisfying 
CiVI^ + C2 ^Cn-^. 

By Corollarv 12 . 31 - (v) . we have 

P ( max \ YA=iZij^i/Vri\ > t ] 

^ P ^max \Yy7=i^ij^i^i/Vn\ > + Cn~'^, 

uniformly in t G M. By the Gaussian Concentration Inequality, for every 
t > 0, we have 

P ( max \ J27=iZijCriei/Vn\ > E[max \ J27=lZijCr^ei/^/n\] + Ct] ^ e"*^ 
Since E[inayii<^j<^p\Y^^=iZijaiei/ ^/n\] ^ Cy/logp, we conclude that 



(33) P max > Cy^log(pn) ^ C'n~''. 

Moreover, 

E„[z2 (ef - af)] = En[zfj{ei - Sif] + E„[4.(ef - af)] + 2En[zfje,{ei - Si)] 
=:I,+II,+III,. 

Consider Ij. We have 

^(1) max{ei-eif ^(2) C0 - ^(3) C \\En[vie,]f , 

where (1) follows from assumption S-(ii), (2) from S-(iv) and S-(v), and (3) 
from S-(vi). Since E[||E„[i;jej]|p] ^ C/n, by Markov's inequality, for every 
t > 0, 



(34) 



P (^max E„[z2 (e, - e,f] > < C/{nt). 
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Consider IIj. By Lemma I A . 1 1 and Markov's inequality, we have 

(35) P (^maxjE„[4.(e2 _ af)]\ > ?j ^ CBl{\ogp)/{V^t). 

Consider ///j. We have \IIIj\ ^ 2|E„[4.T;^(/3-^)ei]| ^ 2||E„[4.eit>i] || ||^- 
j3\\. Hence 

P ( max \¥.n[zf^ei{ei - £i)]\ > t ] 

^pfmax \\En[zfjeiVi]\\ > t] + P(||^ - /3|| > 1) 

(36) ^C[Bl{logp)/{V^t) + l/n]. 
By dMD-dMl), we have 

(37) p(^maxjE„[4.(ef-a2)]| ^ C[Bl{logp)/{V^t) + l/{nt) + l/n]. 
In particular, 



P ( maxjE„[4.(e/ - af)]\ > a'/2 ] ^ Cn 



Since E„[4(t?] ^ > (which is guaranteed by S-(iii) and S-(ii)), on the 
event maxi^^^p |E„[4(ef — crf)]\ ^ £^/2, we have 



min E„[z2ef] ^ min E^f^af] - ^ ^2/2, 



and hence 



IT — Tnl = max 



^ C max 



E„[4a2] - ^E^K^.e; 



X Tn 



X To 



^ C max |E„[4a2] - E„[4ef]| x Tq, 



where the last step uses the simple fact that 

^/a + \/b V« 

By dMl) and (I37D, for every t > 0, 



P IT - Tol > CtVlog(pn) ^ + Bl{logp)/{V^t) + l/{nt)]. 



By choosing t = (log(pn)) ^' with sufficiently small c' > 0, we obtain 
the claim of this step. 

Step 2. We show that P(Pe(|Ty - Wo\ > Ci) > C2) < C2 for some Ci and 
C2 satisfying Ci\/log]5 + C2 ^ Cn-". 



48 CHERNOZHUKOV, CHETVERIKOV, AND KATO 

For < t ^ consider the event 

£ = \ (ei)r=i : max |E„[4.(ef - af)]] ^ t, max(ei - Sif ^ 

By calculations in Step 1, ¥{8) ^ 1 - C[Bl{\ogp) / {^t) + l/(nt2) + l/n]. 
We shall show that, on this event, 



(38) Pe max \ YJi ei/Vn| > C^/\og(Jm)] ^ 



(39) Pe ( max \ YA=iZii{^i - ei)ei/y/n\ > Ct^\og{pn) ] ^ n ^ 



For (|38|) . by the Gaussian concentration inequality, for every s > 0, 

Pe ( max \Yl'^=lZijeiei/^/n\ > Ee[max \Y17=i^ij^i^i / Vn\] + Cs] ^ e~'*^ 

where we have used the fact Enfz^ ef] = En[zfjaf]+En[z'^j{ej-(Tf)] ^ a'^+t ^ 
CT^ + /2 on the event £. Here Ee[-] means the expectation with respect to 
i^i)i=i conditional on (ej)"^-^. Moreover, on the event £, 



Ee[max Er=i^ij^jej/Vn|] ^ Cy^logp. 



Hence by choosing s = y/log n, we obtain ([38]) . Inequality ([39|) follows 

4< 



similarly, by noting that {En[zfA£i — Sj)^])^/^ ^ maxi^j^„ |ej — ei| ^ t on 



the event £. 
Define 

W, := max ■ ^^=1 ^^,e,eJV^\ 

Note that E„[2:j?^(Tj?^] ^ o;^. Since on the event iS, maxi^cj^p |E„[z|^.(ef—cjj^)]| ^ 
t ^ o;^/2, in view of Step 1, on this event, we have 
\W - Wo\ (^\W - Wi\ + \Wi - Wo\ 
^ CtWi + \Wi-Wo\ 

^ Ct max \Yl'i=iZij^iei/Vn\ +C max \YA=iZiji£i -£i)ei/^/n\. 
Therefore, by ([38|) and ([39|) . on the event we have 

Pe (\W - Wo\ > Ct^log{pn)) ^ 2n~^. 



By choosing t = (log(pn))~^n~'^ with sufficiently small c > 0, we obtain the 
claim of this step. 

Step 3. Steps 1 and 2 verified conditions ([H]) and (fT5]l in Section [3l 
Theorem IH . 1 1 case (a) follows from Corollary 13 . II - (iv) . ■ 
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